dms+html — DMS dialect for HTML

Brand: dms+html File extension: .dms.html Spec version: 0.1 (draft) DMS tier required: 1 (_dms_tier: 1 in front matter) Parent spec: TIER1.md Tier-0 base: SPEC.md

Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.

What this is

dms+html is a tier-1 dialect that lets you write HTML-shaped documents in DMS. Each |tag(...) call produces a single HTML element; the value tree carries the element's children, the decoration sidecar carries its tag name and attributes.

Why a dialect rather than direct HTML? The promise of DMS tier 1 — element-shaped data without burying the values — applies here. Compared to HTML/XML, dms+html keeps text content in the value-column, separates structural metadata from content, preserves comments through round-trip, and inherits tier-0's small-and-strict grammar. Compared to JSON-modeled HTML (the {tag: "p", attrs: {...}, children: [...]} AST shape), it stays indent-readable and grep-friendly.

dms+html is not a serialization format for HTML — it's a DMS-shaped representation of HTML-like documents. A separate runtime / render layer converts a decoded Document_t1 to an HTML byte stream (and back) when an actual .html file is needed.

Quick comparison

<html lang="en">
  <head>
    <title>DMS feature tour</title>
    <meta charset="UTF-8">
  </head>
  <body class="main">
    <h1>Welcome to DMS</h1>
    <p class="lede">
      Click <a href="/spec.html">here</a> to read the spec.
    </p>
  </body>
</html>

In dms+html:

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
  + |body(class: "main")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."

Realistic example

A landing-page hero section — the kind of snippet every front-end project has. Native HTML first, then the dms+html equivalent, then a note on what the DMS representation gains.

Native HTML

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="description" content="DMS makes structured data readable.">
  <meta name="og:title" content="DMS — Document Markup System">
  <link rel="stylesheet" href="/assets/main.css">
  <link rel="preload" href="/assets/hero.webp" as="image">
  <title>DMS — Document Markup System</title>
</head>
<body>
  <header id="site-header" class="sticky top-0">
    <nav>
      <a href="/" class="logo">DMS</a>
      <ul class="nav-links">
        <li><a href="/spec.html">Spec</a></li>
        <li><a href="/dialects.html">Dialects</a></li>
        <li><a href="https://github.com/dms-lang/dms">GitHub</a></li>
      </ul>
    </nav>
  </header>

  <main>
    <section class="hero">
      <h1>Structured data that stays readable</h1>
      <p class="subheading">
        DMS is a small, strict grammar for configuration, manifests,
        and data interchange — with comments that survive round-trips.
      </p>

      <ul class="feature-list">
        <li>Type-checked named params — no silent <code>port: "80"</code> coercions</li>
        <li>Comments preserved on decode → mutate → re-emit</li>
        <li>Grep-friendly: <code>grep '|button'</code> returns every button</li>
        <li>Dialect system for HTML, HCL, KDL, RON, k8s, and more</li>
      </ul>

      <div class="cta-group">
        <a href="/spec.html">
          <button type="button" class="btn btn-primary" tabindex="0">
            Read the spec
          </button>
        </a>
        <a href="https://github.com/dms-lang/dms">
          <button type="button" class="btn btn-secondary" tabindex="0">
            View on GitHub
          </button>
        </a>
      </div>
    </section>

    <section class="features" id="features">
      <h2>Why DMS?</h2>
      <p>
        Every format solves the readability problem differently. DMS bets
        on small grammar + explicit decoration sidecar + strong types.
      </p>
    </section>
  </main>
</body>
</html>

dms+html equivalent

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

# document comment — survives round-trip; dropped by browsers
+ |html(lang: "en")
  + |head
    + |meta(charset: "UTF-8")
    + |meta(name: "viewport", content: "width=device-width, initial-scale=1.0")
    + |meta(name: "description", content: "DMS makes structured data readable.")
    + |meta(name: "og:title",    content: "DMS — Document Markup System")
    + |link(rel: "stylesheet", href: "/assets/main.css")
    + |link(rel: "preload",    href: "/assets/hero.webp", as: "image")
    + |title "DMS — Document Markup System"
  + |body
    + |header(id: "site-header", class: "sticky top-0")
      + |nav
        + |a(href: "/", class: "logo") "DMS"
        + |ul(class: "nav-links")
          + |li
            + |a(href: "/spec.html") "Spec"
          + |li
            + |a(href: "/dialects.html") "Dialects"
          + |li
            + |a(href: "https://github.com/dms-lang/dms") "GitHub"
    + |main
      + |section(class: "hero")
        + |h1 "Structured data that stays readable"
        + |p(class: "subheading")
          + "DMS is a small, strict grammar for configuration, manifests,"
          + " and data interchange — with comments that survive round-trips."
        + |ul(class: "feature-list")
          # each li is a stable query target: grep '|li' finds all items
          + |li "Type-checked named params — no silent port: \"80\" coercions"
          + |li "Comments preserved on decode → mutate → re-emit"
          + |li "Grep-friendly: grep '|button' returns every button"
          + |li "Dialect system for HTML, HCL, KDL, RON, k8s, and more"
        + |div(class: "cta-group")
          + |a(href: "/spec.html")
            # tabindex: 0 is integer — the dialect type-checks it
            + |button(type: "button", class: "btn btn-primary", tabindex: 0)
              "Read the spec"
          + |a(href: "https://github.com/dms-lang/dms")
            + |button(type: "button", class: "btn btn-secondary", tabindex: 0)
              "View on GitHub"
      + |section(class: "features", id: "features")
        + |h2 "Why DMS?"
        + |p
          + "Every format solves the readability problem differently. DMS bets"
          + " on small grammar + explicit decoration sidecar + strong types."

What's visible in DMS that's lost or obscured in HTML

Comments round-trip. The # document comment at the top and the # each li is a stable query target inline comment survive encode → decode → re-emit unchanged. Browsers discard HTML comments from the DOM on parse; a prettier --write pass often drops them entirely. DMS's comment AST preserves them as first-class nodes.

Type-checked attributes. tabindex: 0 is an integer in the dialect spec (tabindex: { type: "integer" }). If you write tabindex: "zero" or tabindex: 0.5, the decoder rejects it. HTML serializes all attributes as strings; tabindex="0" and tabindex="yes" are both legal text — type errors are silent until the browser ignores or misinterprets the value.

Structured selector. Running grep '|button' on a .dms.html file finds every button element. The equivalent HTML grep — grep -E '<button' — also works, but grep '|button(.*tabindex' finds buttons with a tabindex attribute: a structured predicate on two decoration fields at once, which grep over raw HTML cannot do without either a parser or a fragile regex.

Dialect canonical spec

+++
_dms_tier: 0
+++

name:             "html"
version:          "1.0.0"
version_strategy: "caret"          # default; npm-style ^

families:
  + name:           "tag"
    default_sigils: ["|"]
    empty_default:  []
    content_slot:   "children"
    params:
      mode: "wildcard_with_typed"
      typed:
        # ── Global attributes (any HTML element) ─────────────
        id:               { type: "string" }
        class:            { type: "string" }
        lang:             { type: "string" }
        dir:              { type: "string" }
        title:            { type: "string" }
        style:            { type: "string" }
        tabindex:         { type: "integer" }
        hidden:           { type: "boolean" }
        accesskey:        { type: "string" }
        contenteditable:  { type: "string" }
        spellcheck:       { type: "boolean" }
        draggable:        { type: "boolean" }
        translate:        { type: "string" }
        role:             { type: "string" }

        # ── Common element-specific (string-typed) ──────────
        href:             { type: "string" }
        src:              { type: "string" }
        alt:              { type: "string" }
        type:             { type: "string" }
        name:             { type: "string" }
        value:            { type: "string" }
        placeholder:      { type: "string" }
        for:              { type: "string" }
        action:           { type: "string" }
        method:           { type: "string" }
        rel:              { type: "string" }
        target:           { type: "string" }
        charset:          { type: "string" }
        content:          { type: "string" }   # literal attribute on <meta>
        as:               { type: "string" }
        crossorigin:      { type: "string" }
        integrity:        { type: "string" }
        loading:          { type: "string" }
        decoding:         { type: "string" }
        referrerpolicy:   { type: "string" }

        # ── Boolean attributes ──────────────────────────────
        disabled:         { type: "boolean" }
        checked:          { type: "boolean" }
        required:         { type: "boolean" }
        readonly:         { type: "boolean" }
        multiple:         { type: "boolean" }
        selected:         { type: "boolean" }
        autofocus:        { type: "boolean" }
        autoplay:         { type: "boolean" }
        controls:         { type: "boolean" }
        loop:             { type: "boolean" }
        muted:            { type: "boolean" }
        defer:            { type: "boolean" }
        async:            { type: "boolean" }
        open:             { type: "boolean" }
        reversed:         { type: "boolean" }
        novalidate:       { type: "boolean" }

        # ── Numeric attributes ──────────────────────────────
        width:            { type: "integer" }
        height:           { type: "integer" }
        cols:             { type: "integer" }
        rows:             { type: "integer" }
        size:             { type: "integer" }
        maxlength:        { type: "integer" }
        minlength:        { type: "integer" }
        step:             { type: "integer" }
        min:              { type: "integer" }
        max:              { type: "integer" }
        span:             { type: "integer" }
        colspan:          { type: "integer" }
        rowspan:          { type: "integer" }
        start:            { type: "integer" }

The wildcard_with_typed mode means: the listed keys are type-checked, but any other key passes through unchecked. That covers:

data-* attributes (custom data)
aria-* attributes (accessibility — ~50 attrs, all strings, not worth enumerating)
on* event handlers (onclick, onload, etc.)
Framework-specific attributes (x-* for Alpine, wire:* for Livewire, etc.)
New HTML attributes added in future revisions

Tier-1 ports decode any of these as plain string params; the runtime layer interprets them.

Content semantics

The tag family's content is a list of children. A child is:

A scalar (string, number, boolean, datetime) — text content.
A nested decorated value — another |tag call, which becomes a child element.

The list's three equivalent source forms (mutually exclusive on any single element):

Block form (most common)

+ |p(class: "lede")
  + "Click "
  + |a(href: "/spec.html") "here"
  + " to read."

Each + item under the indented child block becomes an element of the children list.

Flow form

+ |p(class: "lede", children: ["Click ", |a(href: "/spec.html") "here", " to read."])

The children: param is hoisted into the value tree on decode. Useful for short inline runs.

Inline base_value form (single-child shorthand)

+ |title "DMS feature tour"
+ |h1 "Welcome to DMS"

Equivalent to |title(children: ["DMS feature tour"]) — the inline scalar is wrapped in a single-element children list.

When no form is used

+ |meta(charset: "UTF-8")
+ |br
+ |hr

No children. Value tree gets the family's empty_default ([]) — empty children list. This is the form for self-closing / void elements.

Conflict

Specifying content via two of these forms on the same element is a parse error (the tier-1 conflict rule applies). Pick one.

Encoder canonical form

The encoder emits each tag using a deterministic heuristic (inherited from TIER1.md §"Encoder canonical form"):

Inline base_value form when children is a single scalar (|h1 "...", |li "...").
Block form when children has > 1 element OR any element is a nested non-scalar.
Flow children: form when all elements are scalars / decorated scalars and the line fits the width threshold (typical mixed-text-with-spans paragraphs).
No content emission when children matches the empty default [] (self-closing / void elements).

Tag inventory (representative)

dms+html validates parameters at the family level, not per-tag. The decoder accepts any tag name; per-tag rules (<input> requires type, <img> requires src and alt, etc.) live in the runtime / render layer, not in the DMS decoder.

Representative tags covered by this first cut:

Tag	Content shape	Notable attrs
`html`	block container	`lang`, `dir`
`head`	block container	(global only)
`body`	block container	`class`, `id`
`title`	inline-text	(global only)
`meta`	self-closing	`charset`, `name`, `content`, `http-equiv`
`link`	self-closing	`rel`, `href`, `type`, `as`
`script`	text-content	`src`, `type`, `defer`, `async`
`style`	text-content	`media`, `type`
`h1`–`h6`	inline / mixed	(global only)
`p`	mixed content	`class`, `id`
`div`	mixed content	(global)
`span`	inline mixed	(global)
`a`	inline mixed	`href`, `target`, `rel`
`ul`/`ol`	list of `li`	`start`, `reversed`
`li`	mixed content	`value`
`img`	self-closing	`src`, `alt`, `width`, `height`, `loading`
`br`/`hr`	self-closing	(global only)
`form`	block container	`action`, `method`, `enctype`
`input`	self-closing	`type`, `name`, `value`, `placeholder`
`button`	inline mixed	`type`, `disabled`
`table`	block container	(global)
`tr`/`td`/`th`	block / mixed	`colspan`, `rowspan`

This is not the full HTML5 element registry — it's a sampling to cover the common shapes (block container, mixed content, self-closing, text-content). Any HTML5 tag works regardless of whether it appears here, since the dialect doesn't validate per-tag.

Versioning

dms+html follows semver. The first published version is 1.0.0. The default match strategy is caret (^1.0.0 matches any 1.x.x ≥ 1.0.0).

What counts as a version bump:

Breaking (major). Changing the content_slot name; renaming a typed attribute; changing a typed attribute's type; switching modes from wildcard_with_typed to strict; removing a typed attribute (since previously-conformant docs may rely on its typing).
Additive (minor). Adding a new typed attribute. Because mode is wildcard_with_typed, unknown attrs were previously passing as wildcards; adding them as typed only tightens validation, but a doc that wrote a wrong-typed value would now fail. Considered a minor bump under semver because it's an added constraint, not a removed feature.
Bug fix (patch). Documentation clarifications, correcting a typo in a typed attribute's name (with deprecation), fixing inconsistencies in the dialect spec text.

File extension

dms+html documents use the .dms.html extension. The double extension lets editors and grep dispatch on the secondary extension without parsing front matter:

my-page.dms.html       # dms+html document
my-config.dms          # plain DMS (tier 0 or tier 1 with non-html dialects)

A dms+html document must declare _dms_tier: 1 and import the html dialect in _dms_imports regardless of file extension; the extension is a tooling hint, not a binding declaration.

Worked example

A complete dms+html document with the corresponding value tree and decoration sidecar is in TIER1.md §"Worked example: HTML in dms+html".

What's not in scope (this dialect)

Rendering / serialization to HTML byte stream. That's the runtime / render layer. Producing an .html file from a decoded Document_t1 is a separate tool / library.
Per-tag validation. "input requires type", "img requires src and alt", "table cells must be inside tr", etc. — runtime layer concerns. The DMS decoder accepts any tag with any attrs.
Custom elements / web components. Tag names with - (e.g., my-component) work because - is a valid character in tier-0 bare keys, but the dialect does no special handling — they're just tags.
SVG, MathML, foreign content. Each gets its own dialect if/when a real need surfaces (dms+svg, dms+mathml).
Doctype. dms+html documents don't carry a doctype; the render layer prepends <!DOCTYPE html> when emitting HTML5.
Script / style content escaping. The dialect treats script and style element content as plain strings; the render layer is responsible for any CDATA-style handling needed for HTML serialization.

Open questions for v0.2+

Per-tag empty defaults. Some elements are pure self-closing (<meta>, <link>, <br>, <hr>) and have no meaningful "empty default of []" — they're always empty. Currently the family-level empty_default: [] covers them, but a future revision could add per-tag overrides if a render layer wants tighter signaling.
Boolean attribute serialization. HTML serializes boolean true as the bare attribute name (<input disabled>) and false by omission. The dialect carries booleans as plain values; this convention belongs in the render layer.
Attribute order preservation. Tier-1's path-keyed sidecar preserves source order on decode; encoder canonical form re-emits in source order. HTML serialization may want a different order (e.g., alphabetical) — render layer concern.
Inline expressions. ${var} interpolation, JSX-style {expr}, server-side templating syntax — none of these are in dms+html. A dms+jsx or dms+templating dialect could add them.