DMS — Tier 1 specification
Version: 0.1 (draft)
Sibling specification to SPEC.md. Defines the tier 1 extension: a syntactic surface that lets DMS represent element-shaped data (markup, declarative AST nodes, structured function calls) on top of the tier-0 value-tree algebra.
A tier-0-only decoder (per SPEC.md) rejects tier-1 documents at front-matter decode with a tier-1-pointing error. A tier-1-capable decoder ships the four functions described in Decoder / encoder split below; tier-0 conformance is preserved.
Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.
Goal
Add a syntactic surface to DMS that lets it represent
element-shaped data (markup, declarative AST nodes, structured
function calls) without compromising the value-tree algebra
(Map / List / Scalar) that the rest of the spec rests on.
Concrete near-term use cases:
- DMS-shaped HTML / XML / SVG / JSX-like markup.
- DMS-shaped structured logging with named verbs (
|event(...),|metric(...)). - DMS-shaped IaC where named decorators map to resources
(
|resource("aws_instance", ...)).
What tier 1 does not add: expressions, references, includes from other files, computation. Those are separate decorator families that could be defined later as opt-in dialects, each with its own spec, but they are not part of this draft.
Tier 1 inherits all of tier-0's design non-goals (SPEC.md §"Design non-goals") unchanged:
- No
null/none. Missing values are expressed by key absence — including in tier-1 decoration AST records, where optional fields are omitted rather than set to a null marker. - No string concatenation or line continuation for single-line strings. Heredocs remain the only multi-line string mechanism.
- No unit suffixes.
- No references / interpolation / schemas.
- No anchors / aliases / type tags.
If a tier-1 dialect's runtime layer wants to introduce any of
these (e.g., an i18n dialect doing key resolution at render
time), that's the dialect's contract with its consumers — DMS
itself stays on the same algebra.
Core concept: decorators as AST decoration
A tier-1 line can carry a decorator call:
+ |tag(lang: "en")
The decorator (|tag(...)) is AST decoration, not value-tree
content. The decoded document keeps two parallel structures:
- Value tree — pure
Map / List / Scalar. Same algebra as tier 0. Generic tooling (diff, patch, JSON-Schema, lite-mode parser, pretty-printer) operates on this without knowing tier 1 exists. - Decoration sidecar — parallel structure addressing each decorated value-tree node by path. Records every decorator call (and every comment, reusing the same machinery).
The decorators sidecar is to |-calls what the comments AST
is to # ... lines today: parallel, path-keyed, captured
during full-mode decode, dropped in lite mode, reattached on
encode for byte-stable round-trip. Its canonical shape is a
list of per-node entries — see "AST shape" for details.
Sigil categories
Two disjoint roles:
_is reserved for core / built-in decorators. Existing tier-0 heredoc modifiers (_trim,_fold_paragraphs) keep working unchanged. Future spec-defined core decorators across any tier also use_.- Tier 0 reserves a fixed set of decorator sigils that no tier-0 document may use as a line-start character. Tier-1 imports bind sigils from this set to dialect families.
Tier-0 reserved decorator sigil set
Tier 0 spec adds the following characters as reserved decorator sigils:
! @ $ % ^ & * | ~ ` . , > < ? ; =
Tier 0 also reserves the Reserved Emoji Set (see SPEC.md
§Lexical → "Reserved emoji characters"): every extended grapheme
cluster containing at least one codepoint from
Extended_Pictographic ∪ Regional Indicators ∪ Emoji Modifiers
∪ {U+20E3}. Emoji-bearing grapheme clusters are first-class
decorator sigils alongside the ASCII set.
Both kinds of chars never appear in valid tier-0 bare keys, so the reservation costs zero existing tier-0 documents. The tier-0 spec rejects them in every value-position — see SPEC.md §"Implicit reservations":
| Position | Tier-0 status | Tier-1 use |
|---|---|---|
| First non-whitespace of a body line | parse error | decorator at scalar root or leading |
After key:, before inline_value |
parse error | inner decoration on kvpair value |
After +, before inline_value |
parse error | inner decoration on list-item value |
| After inline_value on kvpair line, pre-NL | parse error | trailing decoration on kvpair value |
| After inline_value on list-item line, pre-NL | parse error | trailing decoration on list-item value |
| Inside flow_array, before/after element | parse error | inner/trailing decoration on flow element |
| Inside flow_table, before/after value | parse error | inner/trailing decoration on flow_kv |
The reservation positions are exactly the value-positions where tier 1 places decoration. Tier-1 decoration "fills in" the slots tier 0 already keeps empty.
This is a fixed list in the tier-0 spec, not a dynamic per-file reservation. Benefits:
- Lexer is stable across files. Tokenizing a tier-0 line doesn't depend on whether or what front-matter declares; a reserved sigil is a parse error regardless of the document's tier.
- Forward compatibility. Tier-0 decoders reject tier-1
decorator usage explicitly, with an actionable error
("decorator sigil
|requires tier 1; set_dms_tier: 1and declare the dialect in_dms_imports"). - Cross-file tooling stays grep-able. A reserved sigil always means the same thing across the corpus — a decorator call. Search and refactor tools don't need per-file context.
The tier-0 spec also reserves _-prefixed root keys in
front matter (existing behavior); _dms_imports and any future
_dms_* field name is covered by that.
Multi-character sigils
A sigil is a non-empty sequence of sigil atoms. A sigil atom is one of:
- a single ASCII char from the reserved decorator sigil set above, or
- one extended grapheme cluster from the Reserved Emoji Set.
Single-atom sigils (|, @, 🚀, 🇺🇸) are the common case;
multi-atom sigils (||, |@, ~~, &|*, 🚀🔥, |🚀) are
the escape valve when a file imports more dialects/families
than single-atom sigils can accommodate. Capacity scales as
N + N² + N³ + …, effectively unlimited; the addition of
the Reserved Emoji Set increases N from 17 to ~3.7k.
Sigils may only combine atoms drawn from the reserved sets
above. Cross-set combinations with non-reserved chars like |+
are not valid sigils — + is the tier-0 list-item marker, and
admitting it into sigil position would reintroduce parse
ambiguity that tier 0 avoids by construction. ASCII-reserved
chars and emoji clusters may be mixed within a single sigil
(|🚀, 🚀|, @🇺🇸@); both kinds are tier-0-reserved at the
same parse positions, so the resulting tokenization stays
context-free.
Lexer rule: longest match. At decoration position, the lexer
reads the maximal run of sigil atoms — a sequence of ASCII
reserved-sigil chars and/or Reserved-Emoji grapheme clusters,
in any order, with no intervening non-reserved characters — and
matches it against the file's bound-sigil table (built from each
import's bind, plus dialect defaults). The longest registered
prefix of that run is the sigil; remaining sigil atoms (if any)
are a parse error. If the run has no registered prefix, the
error is "unknown sigil '
Concretely: if a file binds both | and ||, then ||tag
lexes as (||, tag) and |tag lexes as (|, tag) — no
ambiguity. If a file binds only |, then ||tag is a parse
error (the lexer would read || as the candidate, find no ||
binding, and reject it rather than fall back to | followed by
literal |tag). Likewise, if a file binds 🚀 to a family,
🚀tag lexes as (🚀, tag); 🚀🔥tag is a parse error unless
🚀🔥 is also bound.
Skin-tone, ZWJ, and keycap sequences are matched as single
atoms by the UAX #29 grapheme-cluster boundary algorithm
(frozen at 15.1.0). 👍🏽 is one atom, not two; binding 👍
does not match 👍🏽. Authors who want both must bind both
explicitly.
Front matter additions
Tier 1 adds two reserved root fields to front matter:
_dms_tier: 1— already reserved by SPEC v0.14; sets the tier. Required for any document that uses tier-1 features._dms_imports— list of dialect imports. Required if the document uses any non-_-prefixed decorator.
Tier-0 documents (_dms_tier: 0 or no front matter) must not
contain _dms_imports. Decoder rejects such documents with an
error suggesting _dms_tier: 1.
Import shape
+++
_dms_tier: 1
_dms_imports:
+ dialect: "html"
version: "1.0.0"
ns: "html"
bind:
"|": ["tag", "entity"]
deny:
tag: ["script", "iframe"]
alias:
tag:
para: "p"
+++
Per import: two required identity fields, plus four optional sub-maps, each handling one orthogonal concern.
Identity (required):
dialect(string) — the dialect name. Resolved against the decoder's installed dialect registry.version(string, semver) — the version of the dialect to bind. Must be a valid semver string (MAJOR.MINOR.PATCH, with optional-pre.releaseand+build.metadata). The decoder matches the requested version against installed versions using the dialect's declared match strategy — see "Dialect versioning" under "Dialect specification contract".
Identity (optional):
ns(string) — namespace under which this dialect's decorators are accessible via fully-qualified reference. When set, calls of the form|<ns>.<fn_name>(...)resolve through this dialect explicitly. Unqualified calls still resolve through normal lookup.
Syntax (optional):
bind(map: sigil → list of family-names) — overrides the dialect's default sigil set for this file. If present, fully replaces defaults — you control which sigils bind which families. If absent, the dialect's published defaults apply. Sigil keys must be sequences of sigil atoms — ASCII chars from the tier-0 reserved decorator sigil set and/or grapheme clusters from the Reserved Emoji Set (see "Multi-character sigils" above); family-name values must be families the dialect publishes. Multiple families may bind to the same sigil; the function name disambiguates at parse time (see Resolution rules). Single-family bindings still use list form for type consistency:"|": ["tag"], not"|": "tag".
Curation (optional, all keyed by family-name):
allow(map: family → list of names) — whitelist. Only listed names within each family are accepted; wildcard turns off for any family with anallowentry. Families not listed inallowremain at dialect default.deny(map: family → list of names) — blacklist. Listed names rejected; everything else passes through wildcard. Mutually exclusive withallowfor the same family — declaring both for one family is a parse error. Different families may use different modes within one import.alias(map: family → map: alias → canonical) — local renames. Source code uses the alias; AST records the canonical name. Aliases shadow canonical names within a family — writing the canonical name when an alias exists is a parse error, to preserve "one way to write each thing." Aliases are resolved beforeallow/denychecks.
Resolution rules
A decorator call appears at any value-position (see Decoration sites: every value-position). To parse one:
- Lex the sigil. At the call's start, read the maximal run
of sigil atoms — ASCII chars from the tier-0 reserved
decorator sigil set and/or extended grapheme clusters from
the Reserved Emoji Set, in any order, with no intervening
non-reserved characters. Match the longest prefix of that run
against the file's bound-sigil table (built from each
import's
bind, falling back to the dialect's published defaults). The matched prefix is the sigil; any unmatched trailing sigil atoms are a parse error. Lookup yields a set of(family, dialect)candidates — possibly more than one if a sigil binds multiple families. - Lex the name. Read the identifier after the sigil
following tier-0 ident rules. If a
.follows, this is a fully-qualified call (|<ns>.<name>); the namespace bypasses step 1's candidate set and resolves directly through<ns>'s import. - Resolve the name within candidate families:
- Resolve through
aliasfirst: if the identifier is an alias key in a candidate family'saliasmap, replace with the canonical name. Writing the canonical name directly when an alias exists is a parse error. - Apply curation: ifallowis specified, the resolved name must appear in it; ifdenyis specified, the resolved name must not appear in it; otherwise the family is wildcard and any name is accepted. Exactly one candidate family must accept the name. Zero accepting ⇒name not found in any family bound to '<sigil>'. More than one accepting ⇒name '<x>' is ambiguous between families <A>, <B> under sigil '<sigil>'; qualify the call as |<ns>.<x>(...). - Lex the params. If the next char (no whitespace) is
(, parse one param group — flow-table-shaped (named) or flow-array-shaped (positional), determined by the group's first token (see "Positional params" under "Dialect specification contract"). Then loop: while the next char is((no whitespace), parse another group. Otherwise the call has zero explicit param groups. Decoded shape: - bare|tagand empty-parens|tag()both produceparams: [{}]-|tag(a: 1)→params: [{ a: 1 }](named) -|score(95, 5)→params: [[95, 5]](positional) -|tag(a: 1)(b: 2)→params: [{ a: 1 }, { b: 2 }](two named groups) -|score(95, 5)(commented: true)→params: [[95, 5], { commented: true }](positional + named) - Record the AST entry.
{ family: <canonical family>, fn: <canonical fn name>, params: [...], params_dec: [], position: <leading|inner|trailing|floating> }.fnis always the canonical name regardless of whether an alias was used at the source.params_decis empty by default and populated only if a param value was itself prefixed with decoration (see AST shape).
Conflict detection
After all _dms_imports entries are merged (shallow), the
decoder runs two layers of conflict checks.
Front-matter-time: cross-import family collisions
For each (sigil, namespace, family) triple, two different
imports binding the same triple is a hard parse error:
Decoder error in front matter (line 12):
Decorator binding collision on (sigil='|', ns=<unset>, family='tag'):
- import #0: dialect 'html' v1.0.0 binds '|' → 'tag'
- import #1: dialect 'math' v0.1.0 binds '|' → 'tag'
Resolve by remapping one. Suggestion (rebinding 'math' to '@'):
_dms_imports:
+ { dialect: "html", version: "1.0.0", ns: "html" }
+ { dialect: "math", version: "0.1.0", ns: "math",
bind: { "@": ["tag"] } }
Heuristic for which dialect the error suggests remapping: the later one in the imports list, since the earlier import is more likely the file's "primary" dialect.
A single sigil binding multiple families from the same import
is not a collision ("|": ["tag", "entity"] is fine). Same-
sigil multi-family bindings across different imports are also
fine as long as no (sigil, ns, family) triple repeats —
function-name disambiguation handles the rest at parse time.
Parse-time: same-sigil function-name ambiguity
Different families bound to the same sigil may publish
overlapping function names. The decoder cannot enumerate names
ahead of time when one or both families are wildcard, so this
class of conflict is caught at body parse — see step 3 of
Resolution rules. The error message points the user toward
fully-qualified |<ns>.<name>(...) form.
Sigil validation
Every key in any bind map must be a non-empty sequence of
sigil atoms — ASCII chars drawn from the tier-0 reserved
decorator sigil set and/or extended grapheme clusters drawn
from the Reserved Emoji Set (see "Tier-0 reserved decorator
sigil set" and "Multi-character sigils" above for the canonical
sources). A bind key containing any character outside the union
of those two sets is a hard parse error at front-matter decode.
(Note: underscore _ is not in either set — it's its own
category, reserved for core / built-in decorators. ASCII chars
that have Emoji=Yes but are not in the Reserved Emoji Set —
#, *, digits — are not sigil atoms either; ©, ®, and
™ are sigil atoms because Unicode classifies them as
Extended_Pictographic=Yes.)
AST shape
The decorators sidecar on Document_t1 is a list of entries,
one per decorated value-tree node, in source order:
decorators:
-
path: [0]
"|":
-
family: "tag"
fn: "html"
params: [{ lang: "en" }]
params_dec: []
position: "leading"
comments:
-
kind: "leading"
text: "page root"
Shape: List< { path, <kind>: List<Record>, ... } >.
The path field identifies which value-tree node the entry
attaches to. Path is a list of typed segments:
- String segment → map key (any unicode key works without escaping, since segments are typed values, not stringified).
- Integer segment → list index (zero-based).
- Empty list
[]→ the document root.
Examples: [] is the root; ["body"] is the value at map key
body; ["body", "children", 0] is the first element of the
list at body.children.
Each entry beyond path is keyed by decoration kind. Each
value is an array of decoration records of that kind, in
source order. Per-kind shapes:
|(or whichever sigil the file binds): array of{ family, fn, params, params_dec, position }records, in source order.family: canonical family name from the dialect (e.g.,"tag").fn: literal name written after the sigil (e.g.,"html"). Always the canonical name (post-alias-resolution), always present.params: ordered list of param groups. Each group is either aMap(named, flow-table-shaped) or aList(positional, flow-array-shaped). See "Positional params" under "Dialect specification contract" for the lexer rule that determines mode at the open(. Decoded shapes:|tag(bare, no parens) →[{}]|tag()(empty parens) →[{}](equivalent to bare)|tag(a: 1)→[{ a: 1 }](named)|tag(a: 1)(b: 2)→[{ a: 1 }, { b: 2 }](two named)|score(95, 5)→[[95, 5]](positional)|score(95, 5)(commented: true)→[[95, 5], { commented: true }](mixed groups)
params_dec: parallel sidecar for nested decoration on param values. Same shape as the top-leveldecoratorslist (aList< { path, <kind>: ... } >), but rooted at this call's params. Path segments reach into the param groups: integer first segment selects the group index, then subsequent segments descend into that group's value. The second segment's kind reflects the group's shape — string for named ({}) groups, integer for positional ([]) groups. Examples:path: [0, "src", "url"]decoratesparams[0].src.url(named group at index 0).path: [0, 1, "url"]decoratesparams[0][1].url(positional group at index 0, second element, then a named-key descent into its value). Empty list[]when no nested decoration exists.
position: which of the four decoration positions the call occupied at source —"leading","inner","trailing", or"floating". Mirrors thekindfield on tier-0 comment records. Thepathalready identifies which node the decoration attaches to;positionrecords how it was written so encoders can round-trip back to the same form.comments: array of{ kind, text }records, wherekind∈{ "leading", "trailing", "inner", "floating" }(existing tier-0 attachment positions).- future kinds (e.g.,
"original_forms") follow the same shape: a kind key, an ordered array of records of that kind.
A given path must appear at most once in the list. All decoration kinds attached to the same node share one entry. The list-of-entries form is canonical because: (1) it preserves source order without depending on map iteration semantics, and (2) it lets path segments stay typed values rather than baking an escape grammar for arbitrary unicode map keys into a path string.
Identity vs param data: separate AST slots
The decoration record's identity / machinery fields (family,
fn, params, params_dec, position) live at the top
level of the record. params is a separate slot beneath that
level. Function identity lives only in fn; nothing else
does.
This means user-written param keys — name, id, class,
even fn itself — are plain data with no AST-machinery
meaning. There is no collision between the AST's identity slots
and the user's param keys, regardless of what the user writes:
|form(name: "x", fn: "y")
decodes to:
{ "family": "tag", "fn": "form",
"params": [{ "name": "x", "fn": "y" }] }
The AST record's fn holds "form" (the function name from
source). params[0].fn holds "y" (user data). They never
overwrite each other.
The only reserved param key per family is the one declared
as content_slot — that key triggers hoisting and never appears
in params after decode (its value moves to the value tree).
Everything else is plain user data.
Lite-mode behavior
Lite mode drops the decorators sidecar wholesale, the same way
tier-0 lite mode drops the comments sidecar.
For documents whose meaning depends on decorator content (markup docs, anything element-shaped), this is semantically lossy in a way tier-0 lite-mode is not. Comments are advisory; tier-1 decorators are load-bearing. The tier-1 spec must state:
Comments are advisory; consumers may discard them without loss of document meaning. Tier-1 decorators are structural; consumers that discard them must not claim to preserve document semantics.
Decoration attachment
Tier-1 decoration mirrors tier-0 comment attachment exactly. There are four positions; they share source-location rules with tier-0 comments (SPEC.md §"Comments → attachment").
| Position | Source location | Attaches at | Stacks? |
|---|---|---|---|
| Leading | Own line(s) immediately before a kvpair / list-item, no blank line between | The following node | Yes |
| Inner | Between key: (or +) and the value, same line |
The node's value | Yes |
| Trailing | After the value, same line, before newline | The node's value | Yes |
| Floating | Own line(s), blank-line-separated, or after the last child of a block | The enclosing block | Yes |
Path-keying: leading, inner, and trailing all attach at the same
path (the node's value path). Floating attaches one level up
(the parent container's path). The position field on each
decoration record (analogous to comments' kind field) records
which of the four was used, for round-trip fidelity.
Decorator call syntax
decorator_call = sigil name [ "." name ] { "(" [ flow_kvs ] ")" }
sigil = <one of the bound sigils for this file>
name = ident (* tier-0 ident rules *)
flow_kvs = flow_kv { "," flow_kv } (* tier-0 flow_kv *)
- The
name [ "." name ]form covers fully-qualified calls (|html.tagetc.) used to disambiguate between dialects. - Parens are optional.
|tagand|tag()are equivalent; both produceparams: [{}]. The lexer reads the name; if the next char (no whitespace) is(, parse param group(s), otherwise the call has no explicit params. See AST shape. - Multiple back-to-back param groups (
|tag(a: 1)(b: 2)) are allowed; each becomes a separate entry inparams. Dialects that don't use multi-group calls treat the second group as a parse error at the dialect layer.
Examples by position
# leading — own line(s) before a node
|cached_for("1h")
|requires_auth
endpoint: "/api/users"
# inner — between header and value, same line
endpoint: |cached_for("1h") "/api/users"
endpoint: |required # inner-only — value is dialect default
+ |row(class: "header") name: "Alice" # inner on list-item kvpair-continuation form
# trailing — after the value, same line
port: 5432 |validates_range(1024, 65535)
+ "first" |emphasis
# floating — blank-line-separated, attaches to enclosing block
servers:
+ name: "web1"
+ name: "web2"
|section_status("paused")
Stacking and interleaving with comments
Each position accepts an arbitrary number of decorations and comments, in source order. Decorations and comments may interleave freely:
a: "A" /* before |dec1 */ |dec1() |dec2() # this is a trailing line comment
Decoded as four trailing entries on a, in this order:
1. trailing block comment "before |dec1"
2. trailing decorator |dec1()
3. trailing decorator |dec2()
4. trailing line comment "this is a trailing line comment"
Rule (inherited from tier-0 comments): any # or // line
comment must come last — line comments consume to end-of-line,
so anything after them isn't part of the same slot. Decorators
and /* … */ block comments don't consume EOL and can appear in
any order before the line comment.
This rule applies wherever line comments are syntactically
possible (trailing, leading, floating). Inner position is
between header and value on the same line; tier-0 already
forbids # / // there (they'd consume the value).
Decoration sites: every value-position
The grammar additions, by surrounding production:
decoration = decorator_call | line_comment | block_comment
inner_run = decoration { whitespace decoration }
trailing_run = decoration { whitespace decoration }
leading_block = ( decoration NEWLINE )+ (* no blank line before next node *)
decorated_value_t1 = inner_run? base_value? trailing_run?
; (* at least one of inner_run or base_value must be present *)
scalar_root_t1 = leading_block? decorated_value_t1
kvpair_t1 = leading_block? key ":" decorated_value_t1
list_item_t1 = leading_block? "+" decorated_value_t1
flow_array_t1 = "[" [ decorated_inline_t1 { "," decorated_inline_t1 } ] "]"
flow_table_t1 = "{" [ flow_kv_t1 { "," flow_kv_t1 } ] "}"
flow_kv_t1 = key ":" decorated_inline_t1
decorated_inline_t1 = inner_run? inline_value trailing_run?
; (* flow forms have no leading/floating positions *)
base_value covers whatever the surrounding production already
admits (e.g., child_block is available after + and key:,
not inside flow forms — same as tier 0).
Floating decoration follows tier-0's floating-comment rules and attaches at the enclosing container's path, identically to how floating comments attach.
Decoration-only (no base_value)
When inner_run is present but base_value is absent, the
value at that path is the dialect-specified empty default for
the family of the first inner decorator on that line:
+ |meta(charset: "UTF-8") # value resolves to html.tag's empty default ({})
+ |link(rel: "stylesheet", href: "x")
key: |required # value resolves to required's family empty default
Each dialect publishes a per-family empty default as part of its
registration contract (typical defaults: empty table {} for
record-shaped families like tag, empty list [] for
collection-shaped families).
If no inner decoration is present and no base_value is written,
that's a tier-0 parse error as today (e.g., bare + with no
continuation). Trailing decoration without a base_value is
syntactically impossible — there's no value-position for it to
sit after.
Indent-block role: unchanged from tier 0
Because tier 1 only adds decoration positions around values, the
question "what does an indent block under this line mean?" is
answered exactly by tier-0 productions, applied to whatever
base_value the line carries (or the dialect-empty default if
inner-only):
| Line shape | Indent-block role |
|---|---|
key: <inner?> <inline> <trailing?> |
(no indent allowed — leaf) |
key: <inner?> |
child_block is the value of key |
+ <inner?> <inline> <trailing?> |
(no indent allowed — leaf) |
+ <inner?> key: <inline> |
sibling kvpairs of the same record |
+ <inner?> |
child_block is the list-item's value |
Tier 1 does not add a new indent-block opener. The tier-0 discriminator stays binary: a line either opens a block (no inline value present) or it doesn't (inline value present). Decoration sits orthogonally and never affects which case applies.
Content hoisting
Children of a tier-1 element can be written in two equivalent forms — block (indent-block) or flow (a content-slot param):
# Block form
+ |p(class: "lede")
+ "Click "
+ |a(href: "/x") "here"
+ " to read."
# Flow form (semantically equivalent)
+ |p(class: "lede", children: ["Click ", |a(href: "/x") "here", " to read."])
Both decode to the same value tree. The decode pipeline hoists the content-slot param into the value-tree position the decoration is attached to, so consumers find children at one place — the value tree — regardless of which source form was used.
Per-family content-slot declaration
The content-slot param name is not hardcoded. Each family in the dialect spec optionally declares its content-slot name as part of its registration contract:
html.tagdeclares content slot"children"(chosen because no standard HTML element has a literalchildren="..."attribute —"content"would collide with<meta name="..." content="...">)- A markdown-flavored dialect might declare
"body" - An events dialect might declare
"payload" - Families with no value-bearing children (e.g.,
i18n.message,validators.required) declare no content slot and never hoist
When a family has no declared content slot, no param name
triggers hoisting — content (or any other key) is just an
ordinary param.
Hoist pass
Tier-1 decode runs the hoist pass after the body parse:
- Parse front matter, resolve dialect imports and bound sigils.
- Parse body → raw AST where decorator params are intact maps, no value-tree promotion yet.
- Hoist pass. For each decoration record:
- Look up its family in the dialect.
- If the family declares a content slot and the param map
contains that key, move the slot's value out of
params[N] [slot]and into the value-tree position the decoration attaches to. - The remaining keys inparams[N]stay as decoration. - Apply other tier-1 normalizations (decoration-only →
dialect-empty default,
params_decfor nested decoration, etc.).
Hoisting is tier-1-only. Tier-0 docs have no decorators, so the pass is a no-op on them.
Conflict: both forms present
Specifying content via both the declared content_slot param
(children: for HTML, whatever the dialect declared for other
families) and an indent block on the same line is a parse
error:
+ |p(children: ["a"]) # ← parse error
+ "b"
Decoder error at line N:
Element |p has content specified via both 'children:' parameter
and indent block. Pick one.
No magic merging or override semantics. Pick one form per node.
Encoder canonical form
The decoder collapses both source forms to the same value tree; the encoder must choose which form to re-emit. Heuristic, deterministic from content shape:
- Block form when the value is a list with > 1 element, OR any element is itself a non-scalar (a list, table, or decorated value with non-trivial content).
- Flow form when the value is a short inline span — a list with ≤ 1 element, OR all elements are scalars / decorated scalars (the typical mixed-text-with-spans case).
- Block form when the value is a non-list (a table — block is the only way to express a table value-tree position ergonomically).
- No content emission when the value matches the family's
empty default (e.g.,
<meta>-shaped self-closing elements).
This rule rewrites source — |p(children: ["one"]) decodes and
re-encodes as block form. That's the same kind of canonicalizer
behavior tier-0 encoders already do for things like quote style.
A future revision can add per-node form preservation (sidecar
original_form marker) if real users find rewriting jarring.
Dialect specification contract
A dialect publishes a structured specification that the decoder loads at registration time. The spec is the cross-port source of truth — each port translates it into its native registration format, but the contract (what the decoder validates, what canonical names exist, etc.) is identical across ports.
The spec contains four kinds of declarations: families, param signatures (per family), named structs, and the dialect's version-match rule (covered in "Import shape").
Families
Each family the dialect publishes:
name— canonical family name (e.g."tag","entity").default_sigils— sigils this family binds to in the absence of a per-filebindoverride. List form, even for single-sigil defaults.empty_default— the value-tree value the decoder uses for decoration-only positions (no following base_value). Typically{}for record-shaped families,[]for list- shaped families,""for string-shaped, etc.content_slot(optional) — name of the param that gets hoisted into the value tree on decode (Content hoisting). Omit if the family has no value-bearing children.
Naming guidance: pick a slot name that does not conflict with
any valid attribute name for the family. HTML's tag family
uses "children" — "content" was the obvious choice but
collides with <meta name="..." content="...">, where
content is a literal HTML attribute. "children" has no
collision in standard HTML. The hoist mechanism owns one
canonical slot name per family; every other key flows through
params unchanged. If a dialect can't find a non-colliding
name, fall back to a prefixed form ("_children",
"__body__") — verbose but unambiguous.
- params (optional) — param signature for this family
(see below).
Param signatures
Three modes per family:
strict— only declared keys accepted; unknown keys are parse errors. Required keys must be present.wildcard_with_typed— any keys accepted; declared keys are type-checked. Required keys must be present.wildcard— any keys accepted; no checks. Default if noparamsblock is declared. Equivalent to today's behavior.
Per-family params block structure:
params:
mode: "wildcard_with_typed"
typed:
class: { type: "string" }
hidden: { type: "boolean", default: false }
children: { type: "list_of any" }
required: ["id"]
Validation is family-level only. Per-function tightening
(e.g. HTML's <input> requires type, <span> does not) lives
in the dialect's runtime / render layer, not in the DMS decoder.
This keeps the decoder's job small and the spec testable across
ports.
Param values themselves can be any tier-0 inline_value shape —
scalar, flow_array, or flow_table — or a decorated value
(|inner(...) nested in another decorator's params, resolved
through params_dec). Validation applies to the hoisted +
nested-resolved value, after the decoder has finished
normalizing the AST.
Type vocabulary
| Type | Matches |
|---|---|
string |
tier-0 string |
integer |
tier-0 integer |
float |
tier-0 float |
boolean |
tier-0 boolean |
datetime |
tier-0 datetime |
list_of <T> |
flow_array (or hoisted block list) where every element matches <T> |
map_of <T> |
flow_table (or block table) where every value matches <T> |
any |
any value-tree shape |
<StructName> |
a map matching the named struct (see below) |
Named structs
Dialects may declare reusable struct types referenced by name in
typed signatures:
structs:
Address:
street: { type: "string", required: true }
city: { type: "string", required: true }
zip: { type: "string" }
ContactInfo:
email: { type: "string" }
home: { type: "Address" }
work: { type: "Address" }
families:
+ name: "user_card"
params:
mode: "wildcard_with_typed"
typed:
contact: { type: "ContactInfo" }
addresses: { type: "list_of Address" }
Each struct field has the same shape as a typed entry —
type, optional required, optional default. Structs may
reference other structs and built-in types. Cycles are a
registration-time error (the dialect's spec fails to load if a
struct references itself directly or transitively).
Struct names live in the dialect's namespace; cross-dialect
struct references are not supported in this revision. If a
file imports two dialects that both define Address, each
dialect's families resolve their own Address and there is no
shared definition.
Decoder validation behavior
When a decorator call is decoded, after Resolution rules and content hoisting the decoder applies the family's signature:
- If
mode == "wildcard", skip validation. - If
mode == "strict", every key in the (post-hoist) param group must appear intyped. Unknown keys are parse errors. - If
mode == "wildcard_with_typed", declaredtypedkeys are checked when present; unknown keys pass through unchecked. - For each declared key, run type-match: - Built-in types match by tier-0 value-tree kind. - Struct types recursively validate the value as a map against the struct's field signatures.
requiredkeys must be present after hoisting + defaults. Missing required is a parse error.- Defaults fill in absent keys before the AST is finalized
(i.e., the decorator record's
paramsshows the defaulted value).
Validation errors fire at decode time with path context:
Decoder error at line N, decorator |tag(...) at path [0, 1]:
Param 'class' has type integer but signature requires string.
(Path is rendered with the canonical typed-segment form — strings for map keys, integers for list indices, displayed as a list.)
Positional params
A param group is either flow-table-shaped (all named) or flow-array-shaped (all positional). Single calls separate the two modes by group:
|score(95, 5) # one positional group
|tag(class: "lede") # one named group
|score(95, 5)(commented: true) # positional group, then named group
|some(5) # variant payload — positional
|emphasis "text" # base_value form (existing) still works
A param group cannot mix positional and named at the same level. The mode of a group is detected from its first token and locked in for the whole group — mixing within one group is a parse error. To mix modes, use multiple param groups.
Rationale for separation-by-group rather than Python-style "positional-then-named within one group":
- No parse ambiguity. Mode is detectable at the first token
after
(; no lookahead, no late re-classification. - Lossless round-trip. Encoder knows from the AST shape which group was positional vs named and emits faithfully.
- Existing multi-group syntax gets purpose.
(a)(b)was underused at named-only; positional makes it load-bearing. - Consistent with DMS's "one form per concept" principle. No implicit-mixing magic.
Lexer rule
At the open (, the decoder peeks at the first non-whitespace
token:
First token after ( |
Group is | Parse as |
|---|---|---|
key: (ident followed by :) |
named | flow_kvs |
) (immediate close) |
empty | [{}] (back-compat) |
| anything else | positional | flow_array_elems |
The "anything else" includes: scalars, flow forms, decorator calls, base_value-like inline values. A positional group is exactly a flow_array body without the brackets.
A key: token appearing after a positional element in the same
group, or any non-key: token after a named element, is a
parse error: "Cannot mix positional and named params in one
group; use a separate (...) group."
AST shape
The params field on a decoration record becomes a list of
either Map (named group) or List (positional group):
# |tag(class: "x") decodes to:
params: [{ class: "x" }]
# |score(95, 5) decodes to:
params: [[95, 5]]
# |score(95, 5)(commented: true) decodes to:
params: [[95, 5], { commented: true }]
# |tag and |tag() both decode to:
params: [{}]
In languages with sum types: params: List<Map<String,Value> | List<Value>>.
In dynamically-typed languages: detect kind at runtime.
Per-family signature
Families that accept positional params declare a positional
block alongside the existing typed block. The mode enum
gains a fourth value:
| Mode | Positional groups | Named groups | Strict checking on names |
|---|---|---|---|
wildcard |
rejected | accepted | none |
wildcard_with_typed |
rejected | accepted | declared typed keys |
strict |
rejected | accepted | only declared typed keys |
positional |
accepted | accepted | positional slots typed; named keys per typed |
Spec example:
families:
+ name: "variant"
default_sigils: ["|"]
empty_default: {}
content_slot: "value"
params:
mode: "positional"
positional:
- { name: "value", type: "any" }
typed: {} # no named keys defined for this family
Each positional slot has:
- name — used for AST round-trip identity and error messages
- type — from the standard type vocabulary
- required (optional, default true) — slot must be present
- default (optional) — fills in absent slot before AST is finalized
- variadic (optional, default false) — see below
positional is an ordered list of slots. Element 0 of the
positional group fills slot 0, element 1 fills slot 1, etc.
Variadic positional slot
A family that accepts arbitrary-arity positional calls
(|node(a, b, c, d, e)) declares its last slot as
variadic: true. Each surplus positional element collects into
the variadic slot's list.
# A family that takes one required string label and any number
# of additional values:
positional:
- { name: "label", type: "string", required: true }
- { name: "args", type: "any", variadic: true }
# A family that takes only variadic args (KDL-shaped):
positional:
- { name: "args", type: "any", variadic: true }
Rules.
- Only the last slot may be variadic. A variadic slot
followed by another slot is a registration-time error
(
"variadic slot 'X' must be the last positional slot in family <f>"). Forbidding mid-list variadic keeps slot assignment a single left-to-right scan with no end-counting. - At most one variadic slot per family. Falls out of (1).
- Variadic slots are implicitly optional —
requiredanddefaultare not used on variadic slots; zero matching elements is valid and produces[]. - Element-level typing. The slot's
typedescribes the type of each element. The slot's collected value is implicitly a list of those elements. To accept any element type (KDL's case), settype: "any". To accept only integers (|sum(1, 2, 3, 4)), settype: "integer". - Surplus elements never error when variadic is present — they always have a slot to land in. Without a variadic slot, surplus elements remain a parse error per existing rules.
Validation pass — slot assignment.
For a positional group with K elements and a signature with
N slots where slot N-1 is variadic:
- Elements
0..N-2fill the non-variadic slots in order. Type-check each against its slot'stype. Apply defaults to missing optional slots in this range. - Elements
N-1..K-1collect into the variadic slot's list. Type-check each against the variadic slot'stype(element type). - If
K < N-1, missing required non-variadic slots are a parse error. The variadic slot itself can be empty.
| Signature | Call | Validates as |
|---|---|---|
[label: string!, args: any (variadic)] |
|node("x") |
label: "x", args: [] |
[label: string!, args: any (variadic)] |
|node("x", 1, 2, 3) |
label: "x", args: [1, 2, 3] |
[label: string!, args: any (variadic)] |
|node |
parse error: label required |
[args: any (variadic)] |
|node |
args: [] (no positional group) |
[args: any (variadic)] |
|node("a", "b") |
args: ["a", "b"] |
[args: integer (variadic)] |
|node(1, 2, 3) |
args: [1, 2, 3] |
[args: integer (variadic)] |
|node(1, "two", 3) |
parse error: element 1 type mismatch |
AST shape — unchanged.
Variadic does not change the AST. The positional group stays a
flat List<Value> in params[N]:
# |node("x", 1, 2, 3) decodes to:
params: [["x", 1, 2, 3]]
The dialect's positional signature is metadata for
validation and structured access, not an AST transform. Tools
that want the structured { label: "x", args: [1, 2, 3] } view
apply the signature on top of the raw list; tools that don't —
generic walkers, sidecar inspectors, lite-mode consumers — get
the same flat List<Value> regardless of whether variadic is
declared.
Decoder cost.
Zero new lex/parse work. The positional-group lexer still
produces a flat List<Value> regardless of the family's slot
declarations. Variadic is a validation-pass rule applied after
parsing — the same pass that already iterates the list to
type-check non-variadic slots.
The cost the spec adds:
- One new descriptor field per slot (
variadic: bool) - One new validation rule (slot assignment under variadic) — shape: a single left-to-right scan, no backtracking
- One new registration-time check (variadic-slot-must-be-last)
No lexer state change, no new tokens, no new AST shape, no new streaming yield rule. Streaming behavior is identical to the existing positional-group rule (decorator-call parens are yield-suspending; yield is deferred until the close paren regardless of how many elements appear inside).
Encoder.
Encoder emits the positional group as a flat comma-separated list. No marker for the variadic boundary — the boundary is implicit (last N-1 elements after the required slots, where N is the slot count).
|node("x", 1, 2, 3) round-trips as |node("x", 1, 2, 3),
both with and without a variadic-aware encoder.
Decoder validation behavior (extended)
For each param group:
- If group is positional and family
mode == "positional": validate group elements against the family'spositionalslots in order. Type-check each element against its slot'stype. Apply defaults for absent trailing slots if notrequired. - If group is positional and family
mode != "positional": parse error —"Family '<f>' does not accept positional params; use named keys." - If group is named: validate per the existing rules
(
wildcard/wildcard_with_typed/strict). - If family
mode == "positional": named groups still validate againsttypedexactly aswildcard_with_typedwould. Mixed-group calls (one positional, one named) are normal.
Validation errors carry slot identity for positional groups:
Decoder error at line N, decorator |score(...) at path [0, 1]:
Positional slot 1 ('y') has type string but signature
requires integer.
Encoder canonical form
The encoder emits each group in its decoded shape:
params: [{ k: v }]→(k: v)params: [[a, b]]→(a, b)params: [[a, b], { k: v }]→(a, b)(k: v)
Group order is preserved from decode. No re-ordering, no merging across groups, no automatic conversion (positional elements are never re-emitted as named, even when slot names exist).
Multi-line vs single-line group emission
Decorator-call parens are a flow-form region (per "Streaming / incremental decode" above), so they inherit SPEC.md's canonical multi-line layout for flow forms — close-bracket anchors the indent, members one level deeper, trailing comma on the last member. Tier 1 adds two specifics:
- Multi-line emission is not optional infrastructure for tier-1 ports. Decorator-call parens have no block-form alternative (unlike tier-0 lists / tables, which canonicalize to block form when non-empty). Block-shaped dialects routinely have groups with many keys; a single-line-only encoder produces unreadable output. Every tier-1-capable port MUST support multi-line emission for both named and positional groups.
- Mixing single-line and multi-line groups in one call is
permitted. If the first group fits on one line and the
second doesn't, emit single-line then multi-line:
dms |resource("aws_instance", "web")( count: 3, ami: "ami-...", instance_type: "t2.micro", )(Where("aws_instance", "web")is single-line and the named group is multi-line, both anchored on the call's line.)
The break threshold (when to choose multi-line) is the same as SPEC.md's flow-form rule: single-line render exceeding the port's line-width threshold, OR the group containing a value that itself renders multi-line (nested decorator call, multi-line flow form, heredoc).
Decoding accepts both forms unconditionally — decorator-call
parens are yield-suspending, so line breaks inside (...) are
invisible to the parse.
Hoisting interaction
content_slot hoisting is a named-key mechanism. A
positional group does not trigger hoisting, regardless of
whether positional slot 0's name matches content_slot.
Inline base_value continues to hoist:
|some 5→ hoists5into the family'scontent_slot|some(5)→ first positional param is5; not hoisted|some(value: 5)→ namedvaluekey; hoisted ifvalueis thecontent_slotname
Three forms can produce the same value tree if value is the
content_slot AND positional slot 0's name is value. The
dialect MUST document its canonical encode form (typically
inline base_value when possible, else named, else
positional) so round-trips are stable.
Dialect versioning
Dialect versions are semver. All dialects must publish their
versions as MAJOR.MINOR.PATCH strings, with optional
pre-release (-rc.1, -alpha.2) and build-metadata (+build.7)
suffixes. This is a hard requirement — no other versioning
schemes are supported.
The dialect declares one match strategy in its canonical spec, drawn from this fixed enum:
| Strategy | Behavior |
|---|---|
exact |
Installed version equals requested version exactly. |
caret |
Same major, installed ≥ requested. (npm ^x.y.z semantics.) For 0.x.y requests, behaves as tilde — pre-1.0 minor bumps are breaking, per semver convention. |
tilde |
Same major.minor, installed patch ≥ requested patch. |
gte |
Installed version ≥ requested version. |
any |
Any installed version matches. |
Default if undeclared: caret. Standard practice; friendliest
evolution path.
The match algorithms are normative. Every port implements all five strategies identically — no per-port semantics drift.
Pre-release and build-metadata rules:
- Pre-release tags participate in match:
1.0.0-rc.1does not match1.0.0under any strategy. Pre-release ordering follows semver (-alpha<-beta<-rc< release). - Build metadata is ignored for matching:
1.0.0+linuxmatches1.0.0+darwinunderexact.
Where it lives in the dialect spec:
# Dialect canonical spec
name: "html"
version: "1.0.0"
version_strategy: "caret" # optional; defaults to "caret"
structs: ...
families: ...
File-side syntax: the file writes a plain semver string
(version: "1.0.0"); the dialect's strategy is applied. Range
specifiers in the file (npm-style ^1.0.0, ~1.0.0) are not
supported in this revision and would be a parse error if written.
Range-specifier syntax is parked as a future enhancement.
Failure mode at decode:
Decoder error in front matter: dialect 'html' v1.5.0 requested
with strategy 'caret', but installed versions [1.0.0, 1.2.0,
1.4.9] do not satisfy. Install ≥1.5.0 of html.
Registration-time validation: if a dialect spec declares a
version_strategy outside the five-value enum, the port refuses
to register the dialect and surfaces an error.
Branding & file naming
A tier-1 document that imports any dialect is no longer a plain DMS document — it's a DMS dialect document. Naming conventions:
- Brand identifier:
dms+<brand>form. The+echoes EBNF / MIME dialect notation. Avoiddms-<brand>because that conflicts with port-naming convention (dms-c,dms-py,dms-rs). - File extension:
.dms.<brand_extension>. Examples:.dms.html,.dms.svg. Editors and grep can dispatch on the secondary extension without parsing front matter. - Tier 0 stays canonical "DMS". Brands are strict supersets: every tier-1 dialect doc, with its decoration stripped, is a valid tier-0 document.
Open: dialect registry governance. Who allocates short brand names (
html,markup,config)? Punted to a future registry. For now: an allocations document in the SPEC repo with PR-based additions; ax-prefix for unofficial / experimental dialects (x-mybrand); reverse-DNS namespacing available for anything else (io.flolabs.html).
Decoder / encoder split
Tier 1 introduces enough new lex / parse / sidecar machinery that mixing it into the tier-0 entry point would (a) bloat tier-0-only ports with code they don't need, and (b) muddle conformance — "does this port handle tier 1?" should be a yes/no per port, answered by which functions it ships.
Four functions, paired by tier
decode_t0(source, opts?) → Document_t0 # tier-0 only; rejects tier-1
encode_t0(doc: Document_t0, opts?) → str # tier-0 only; rejects decorations
decode_t1(source, opts?) → Document_t1 # accepts both tiers
encode_t1(doc: Document_t1, opts?) → str # accepts both tiers
The opts shape is per-port idiom (kwargs, options struct,
builder, etc.) and carries:
mode: 'lite' | 'full'(default'full')dialect_registry(for_t1only — port-specific lookup of installed dialect implementations)- other port-local knobs (formatter options on encode, etc.)
Tier detection
Tier is not declared by tier-0 documents. The decoder reads front matter and:
_dms_tierabsent (or front matter itself absent) → tier 0_dms_tier: 0→ tier 0 (legal but redundant)_dms_tier: 1→ tier 1- any other value or future-tier integer → version error
A bare tier-0 document needs no declaration; the _dms_tier
field is the opt-in marker for tier ≥ 1.
Document types
Document_t0 = { value_tree, comments }
Document_t1 = Document_t0 + { decorators } # strict superset
Languages with subtyping (Python, TS): Document_t1 extends
Document_t0. Languages without (Rust, Go): explicit field — a
Document_t0 is convertible to a Document_t1 with empty
decorators.
A decode_t1 always produces a Document_t1. If the source was
tier-0 (no _dms_tier: 1), the result is a Document_t1 with
an empty decorators list — structurally indistinguishable from
a tier-0 doc round-tripping through tier-1 machinery.
Behavior at the boundary
decode_t0on tier-1 input (front matter has_dms_tier: 1): errors immediately with the actionable forward-compat message described in SPEC's reservations section:"_dms_tier: 1 found, but this decoder only supports tier 0. Use decode_t1."Decoder must not attempt to parse the body.decode_t1on tier-0 input: succeeds. Tier-0 logic runs unchanged; the decoration-aware lexer paths are gated on the tier flag and cost nothing at parse time when not active.encode_t0on aDocument_t1with non-emptydecorators: errors with"Document has tier-1 decorations at <path>; strip first or use encode_t1."Ports may ship astrip_decorations(doc) → Document_t0helper for consumers who want a tier-0 projection.encode_t1on a tier-0-shapedDocument_t1(emptydecorators): succeeds, emits a tier-0 document (no_dms_tierfield).
Lite vs full is orthogonal to tier
decode_t0(source, mode='lite') → value tree only
decode_t0(source, mode='full') → value tree + comments
decode_t1(source, mode='lite') → value tree only (lossy on tier-1 — see warning)
decode_t1(source, mode='full') → value tree + comments + decorators
Lite mode on tier-1 docs is semantically lossy (per "Lite-mode behavior" earlier). Tier-1-capable ports must surface this in their docs; consumers who lite-decode a tier-1 doc and re-emit it have produced a structurally different document.
Conformance per port
| Port profile | Ships | Corpus |
|---|---|---|
| Tier-0-only | decode_t0, encode_t0 |
tier-0 (~4695 fixtures) |
| Tier-1-capable | All four | tier-0 + per-dialect tier-1 |
A tier-1-capable port still ships decode_t0 / encode_t0 —
some consumers want strict tier-0 behavior in a tier-1-capable
port (e.g., tooling pipelines that reject tier-1 docs by
policy).
Forward extensibility
A future tier 2 adds decode_t2 / encode_t2 alongside the
existing four. Cumulative — each tier-N decoder accepts tier-N
and below. A port adopting tier 2 ships six functions; no
existing function changes signature.
Mutate API symmetry
Ports that expose a mutate / path-update API split the same way:
mutate_t0 operates on value tree + comments; mutate_t1
preserves decorators across mutations. Tier-1 mutations need to
keep decorations attached to the right node across insertion,
deletion, and reorder. The contract mutate_t1 must satisfy,
and the two implementation strategies (opaque-ID backing or
path-rewriting), are spelled out under Stable node identity
(port-level) below.
Streaming / incremental decode
Streaming is optional per port. A port may ship batch-only
decoders (whole document → Document) without violating spec.
If a port ships a streaming decoder, the tier-1 streaming
behavior is fully determined by the rule below — no per-port
divergence.
Yield-point rule (inherited from tier 0): a streaming
decoder yields a (path, value, decorations?) event when it
has fully parsed a value-tree node.
- Leaf values yield on line completion, after any trailing decoration on that line is attached.
- Container values (block-form table, list, child_block) yield on DEDENT.
- Flow forms (
[...],{...}) suspend line-based yields between the open and close brackets — flow_kvs may span lines and the decoder buffers until the close.
Tier-1 addition: decorator-call parens are a yield-suspending
region. A |tag(...) call's parens bound a region treated
exactly like a flow form: yields suspend until the closing ).
Param groups may span lines, and the decoder buffers until the
group closes.
The hoisting wrinkle. When a family declares a
content_slot, the decoder cannot finalize the decorated node's
value-tree value until the param group is fully parsed —
because it needs to check whether the slot key is present and,
if so, hoist it into the value tree. Concretely:
+ |p(class: "lede", children: [..., long flow array spanning lines, ...])
The list-item's value-tree value is either the family's
empty default (if content is absent) or the hoisted array
(if present). The decoder doesn't know which until the param
group closes. The yield for this list-item is therefore deferred
until ). This isn't a new constraint shape — it's the same
suspension flow forms already impose on tier-0 streaming, just
applied to decorator-call parens.
What remains line-streamable in tier 1:
- Leading decoration — yields when the next sibling node yields, with the leading decorations attached. Same as leading comments today.
- Floating decoration — yields on container close. Same as floating comments.
- Inner / trailing decoration — same-line as header / value, no streaming impact beyond what tier-0 already handles.
- Decorator calls with no multi-line params — yield with the line they're on.
- Block-form children of a decorated parent — yield as encountered, identically to tier-0 children under a parent kvpair.
Summary: yield-suspending regions in tier 1 are exactly flow forms + decorator-call parens; hoisting defers a node's yield to the close of its containing param group. Everything else streams as in tier 0.
Per-port API
Each port exposes a path-keyed metadata-lookup API on the decoded
Document so consumers can read tier-1 decoration without
walking the decorators list themselves. Paths are passed as
lists of typed segments (strings for map keys, integers for
list indices), matching the AST path field exactly:
doc.decorations_at(["body", 0]) # all kinds at path
doc.decorations_at(["body", 0], kind="|") # just |-decorations
doc.decoration_value(["body", 0], kind="|")[0].fn
doc.decorations_at([]) # root-level decorations
Spec mandates the lookup contract (path-keyed by typed segments, kind-discriminated, ordered array per kind). Ports may build an internal hash on first lookup to amortize O(1) access over the canonical list shape. Each port picks idiomatic naming on top.
Stable node identity (port-level)
The canonical sidecar shape is path-keyed: every entry's path
field locates the decorated node in the value tree by typed-
segment path. This is the wire format for the sidecar, the
streaming yield shape, and the inspection / debug-print form.
Path-keying is positional — paths shift when lists are mutated (insert at index 0 → every later index shifts by one). Naive mutation of the value tree can leave decorations pointing at the wrong node. To keep the canonical shape simple while still supporting mutation-heavy workloads, the spec adopts a hybrid model:
- At rest, on the wire, in streaming events: path-keyed,
always. No opaque IDs in source, no opaque IDs in serialized
sidecars, no opaque IDs in streaming events. The sidecar stays
self-describing —
path: ["body", 0]means something to a human reading the structure. - In memory, per-port, optional: ports MAY back the value
tree with identity-bearing nodes (wrappers, arenas, object
identity) so a
mutate_t1API can keep decorations attached to nodes through arbitrary mutation without per-call path rewriting. The port chooses the representation; spec does not mandate one.
A port that ships mutate_t1 MUST guarantee:
After any sequence of
mutate_t1operations on aDocument_t1, a subsequentencode_t1produces output where each decoration is emitted on the same logical node it was attached to at decode time — modulo replacements, where the caller overwrites a node with a fresh value and the old node's identity (and its decoration) is dropped.
Ports MAY satisfy this contract by:
- Opaque-ID backing. Wrap value-tree nodes in identity- bearing containers internally; sidecar lookups go through identity, not path. Mutations preserve identity for free; replacements drop it explicitly. Port pays a data-model cost (every node carries identity) but mutation is bookkeeping-free.
- Path rewriting. Keep value-tree nodes plain. Each
mutate_t1operation rewrites affectedpathfields in the sidecar as part of the mutation. Port pays bookkeeping cost but keeps the data model plain. Direct mutation of the value tree (bypassing the API) breaks attachment.
The spec is silent on which strategy a port picks; both satisfy the contract. The choice is between paying the cost in the data model (opaque-ID) or in the mutate API surface (path rewriting).
For ports that ship only decode_t1 / encode_t1 (no
mutate_t1): decoration attachment is preserved across
decode → encode round-trips on unmutated documents. Programmatic
mutation outside a mutate_t1 API is the caller's
responsibility — typical mitigation is to manipulate the sidecar
in tandem with the value tree, or to re-decode after editing
source text directly.
Plain + direct + preserving: pick two
The reason the spec accommodates multiple strategies (rather than
mandating one) is that three properties port authors might want
out of Document_t1 mutation are not jointly satisfiable:
- Plain data model — value tree is unwrapped
Map / List / Scalarin the host language's idiomatic types. No wrappers, no identity slots, no per-node ceremony. - Direct mutation — caller edits the value tree in place, without routing through any mutate API.
- Preservation — decoration stays attached to the same logical node across mutation.
A port can pick any two of the three. Each choice maps to one of the strategies above:
- Opaque-ID backing — gives up #1. Direct mutation +
preservation, but the value tree is wrapped (port-internal data-
model cost). Suits editor-shaped consumers that mutate
heavily and want plain
tree[k] = vsemantics. - Path rewriting in
mutate_t1— gives up #2. Plain data model + preservation, but only when mutations route through the API. Suits API-shaped consumers willing to call methods for mutation. - Pure plain tree, no
mutate_t1— gives up #3. Plain data model + direct mutation, but no preservation guarantee under insertion/reorder. Suits config-shaped workloads that decode → small edit → encode (or re-decode after edits).
This isn't a bug; it's a description of what's available. Each
port picks the pair that matches its workload. The canonical
sidecar shape is path-keyed regardless of the choice — the three
strategies differ only in how a port represents the value tree
in memory and what its mutate surface looks like, not in what
encode_t1 writes to disk.
When to use mutate_t1 (caller-facing rules)
Independent of the port's strategy, callers need a clear rule for when direct value-tree mutation preserves decoration and when it doesn't. The rule follows from path-keying: decoration is attached to the path it was decoded onto. Anything that changes which path identifies which logical node breaks attachment.
Direct mutation is safe (preservation holds) when:
- Leaf value swap at unchanged path. Replacing the scalar at
an existing path with another scalar:
python doc.value_tree["server"]["port"] = 8080 - Editing decoration content in place. Params, comment text,
position fields — these live inside sidecar entries and aren't
path-keyed:
python doc.decorations_at(["body", 0], kind="|")[0].params[0]["class"] = "lede" - Tail-only append to a list. If no decoration entry has a
path through that list at any index, or only through indices
already present, appending past the end shifts nothing:
python doc.value_tree["items"].append("new") # safe iff no decoration on ["items", N+] - Adding a fresh decoration entry. New entries with new paths don't disturb existing ones.
Direct mutation is unsafe without mutate_t1 (or manual
sidecar rewriting) when:
- List indices shift. Any insertion or deletion at a non- tail index of a list that has decoration on later indices, or any list reorder — paths through later indices now point at the wrong logical nodes.
- Map keys change. Rename,
pop+setunder a new key, or any operation that removes the key the decoration was attached to — decoration becomes orphaned. - Subtrees move. Splicing a node from one location to another — decoration entries through the source path are wrong; the destination doesn't get them.
- Non-leaf replacement with structural change. Overwriting a map or list with a value of different shape — decoration through the old structure doesn't fit the new structure.
Operational summary:
If your mutation only changes a leaf value or a decoration's internal contents, edit directly. If your mutation changes which path identifies which logical node — index shift, key change, subtree move, structural overwrite — use
mutate_t1, rewrite the sidecar paths yourself, or re-decode after editing source text.
This rule applies regardless of which strategy a port adopted:
- Opaque-ID-backed ports make the unsafe cases safe automatically (identity survives; sidecar lookups don't depend on path), so the rule is advisory rather than load-bearing.
- Path-rewriting ports rely on the rule strictly: direct mutation in the unsafe categories silently breaks attachment.
- No-
mutate_t1ports offer no recourse — callers either stay in the safe set, do their own sidecar bookkeeping, or re-decode.
The rule is format-level, not port-level: any tier-1
consumer can use it to decide whether doc.value_tree[...] = x
is fine or whether they need a heavier mutation path.
Why hybrid. Pure path-keyed leaves mutation safety unsolved
and forces every consumer that edits a Document_t1 in place to
write its own path-bookkeeping. Pure opaque-ID forces a data-
model regression — every value-tree node would need an identity
slot, breaking the "value tree is plain Map / List / Scalar"
contract that tier 0 keeps and that tier 1 inherits. The hybrid
keeps the canonical shape clean for interop (path-keyed
everywhere it matters across ports) while letting mutation-heavy
ports adopt opaque-ID backing as an internal implementation
choice.
Conformance
- Tier-0 conformance corpus (~4695 fixtures) stays as-is, unchanged.
- Tier-1 adds its own corpus, scoped per dialect:
dms+htmlfixtures,dms+markupfixtures, etc. - Each port declares which dialects it supports. Conformance
becomes a matrix of
(port × dialect → pass count)rather than a single number. - Tier-0 conformance per port is unchanged: every port still hits 4695 / 4695 on tier-0 fixtures.
Negatives accepted in this design
These are tier-1 properties that are not bugs but real costs of the approach:
- Lite mode loses element identity for decorated documents. Acknowledged; spec language fences off the semantic-loss warning.
- Tier-1 → other-format export is muddy. No clean serialization of the value tree + decoration sidecar to JSON, since JSON has no slot for the parallel sidecar. Consumers who need an interop story choose a flattening convention.
- Mutation bookkeeping is per-port. The canonical sidecar is
path-keyed; mutation safety under
mutate_t1is a port-level implementation choice (opaque-ID backing, or path-rewriting in the API). Direct mutation of the value tree, bypassingmutate_t1, breaks decoration attachment in path-keyed ports by design. - Mixed inline content goes vertical in block form. Mitigated
by the family's content_slot in flow form for inline runs
(e.g., HTML's
children:param). - Family / dialect registry coordination cost. Cross-port consistency requires every port that claims dialect support to implement the same families with the same semantics.
- Encoder must dispatch on family. No longer a single walker; needs a per-family renderer registry.
- Schema validation splits in two. Structural validation on the value tree, semantic validation on the decoration sidecar.
Open questions parked for later
- Dialect registry governance + naming.
- Editor / IDE story per dialect (syntax highlighting, completion, schema-aware errors).
- Range-specifier syntax in files (
^1.0.0,~1.0.0); parked pending real-user pressure once dialects mature.
Worked example: HTML in dms+html
The full dms+html dialect spec — families, typed attributes, content semantics, tag inventory, versioning — lives in dialects/dms+html.md. What follows is a single end-to-end example showing source, decoded value tree, and decoration sidecar.
+++
_dms_tier: 1
_dms_imports:
+ dialect: "html"
version: "1.0.0"
ns: "html"
+++
+ |html(lang: "en")
+ |head
+ |title "DMS feature tour"
+ |meta(charset: "UTF-8")
+ |link(rel: "stylesheet", href: "style.css")
+ |body(class: "main", id: "root")
+ |h1 "Welcome to DMS"
+ |p(class: "lede")
+ "Click "
+ |a(href: "/spec.html") "here"
+ " to read the spec."
+ |ul(class: "items")
+ |li "first item"
+ |li "second item"
Decoded value tree (lite mode equivalent):
[
[
[["DMS feature tour"]],
[
["Welcome to DMS"],
["Click ", ["here"], " to read the spec."],
[["first item"], ["second item"]]
]
]
]
Decoration sidecar (full mode), shown in DMS:
decorators:
- { path: [0], "|": [{ family: "tag", fn: "html", params: [{ lang: "en" }] }] }
- { path: [0, 0], "|": [{ family: "tag", fn: "head", params: [{}] }] }
- { path: [0, 0, 0], "|": [{ family: "tag", fn: "title", params: [{}] }] }
- { path: [0, 0, 1], "|": [{ family: "tag", fn: "meta", params: [{ charset: "UTF-8" }] }] }
- { path: [0, 0, 2], "|": [{ family: "tag", fn: "link", params: [{ rel: "stylesheet", href: "style.css" }] }] }
- { path: [0, 1], "|": [{ family: "tag", fn: "body", params: [{ class: "main", id: "root" }] }] }
- { path: [0, 1, 0], "|": [{ family: "tag", fn: "h1", params: [{}] }] }
- { path: [0, 1, 1], "|": [{ family: "tag", fn: "p", params: [{ class: "lede" }] }] }
- { path: [0, 1, 1, 1], "|": [{ family: "tag", fn: "a", params: [{ href: "/spec.html" }] }] }
- { path: [0, 1, 2], "|": [{ family: "tag", fn: "ul", params: [{ class: "items" }] }] }
- { path: [0, 1, 2, 0], "|": [{ family: "tag", fn: "li", params: [{}] }] }
- { path: [0, 1, 2, 1], "|": [{ family: "tag", fn: "li", params: [{}] }] }
(params_dec and position fields elided here; both default
empty / "leading" for this example.)