dms+html — DMS dialect for HTML

Brand: dms+html File extension: .dms.html Spec version: 0.1 (draft) DMS tier required: 1 (_dms_tier: 1 in front matter) Parent spec: TIER1.md Tier-0 base: SPEC.md

Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.

What this is

dms+html is a tier-1 dialect that lets you write HTML-shaped documents in DMS. Each |tag(...) call produces a single HTML element; the value tree carries the element's children, the decoration sidecar carries its tag name and attributes.

Why a dialect rather than direct HTML? The promise of DMS tier 1 — element-shaped data without burying the values — applies here. Compared to HTML/XML, dms+html keeps text content in the value-column, separates structural metadata from content, preserves comments through round-trip, and inherits tier-0's small-and-strict grammar. Compared to JSON-modeled HTML (the {tag: "p", attrs: {...}, children: [...]} AST shape), it stays indent-readable and grep-friendly.

dms+html is not a serialization format for HTML — it's a DMS-shaped representation of HTML-like documents. A separate runtime / render layer converts a decoded Document_t1 to an HTML byte stream (and back) when an actual .html file is needed.

Quick comparison

<html lang="en">
  <head>
    <title>DMS feature tour</title>
    <meta charset="UTF-8">
  </head>
  <body class="main">
    <h1>Welcome to DMS</h1>
    <p class="lede">
      Click <a href="/spec.html">here</a> to read the spec.
    </p>
  </body>
</html>

In dms+html:

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
  + |body(class: "main")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."

Realistic example

A landing-page hero section — the kind of snippet every front-end project has. Native HTML first, then the dms+html equivalent, then a note on what the DMS representation gains.

Native HTML

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="description" content="DMS makes structured data readable.">
  <meta name="og:title" content="DMS — Document Markup System">
  <link rel="stylesheet" href="/assets/main.css">
  <link rel="preload" href="/assets/hero.webp" as="image">
  <title>DMS — Document Markup System</title>
</head>
<body>
  <header id="site-header" class="sticky top-0">
    <nav>
      <a href="/" class="logo">DMS</a>
      <ul class="nav-links">
        <li><a href="/spec.html">Spec</a></li>
        <li><a href="/dialects.html">Dialects</a></li>
        <li><a href="https://github.com/dms-lang/dms">GitHub</a></li>
      </ul>
    </nav>
  </header>

  <main>
    <section class="hero">
      <h1>Structured data that stays readable</h1>
      <p class="subheading">
        DMS is a small, strict grammar for configuration, manifests,
        and data interchange — with comments that survive round-trips.
      </p>

      <ul class="feature-list">
        <li>Type-checked named params — no silent <code>port: "80"</code> coercions</li>
        <li>Comments preserved on decode → mutate → re-emit</li>
        <li>Grep-friendly: <code>grep '|button'</code> returns every button</li>
        <li>Dialect system for HTML, HCL, KDL, RON, k8s, and more</li>
      </ul>

      <div class="cta-group">
        <a href="/spec.html">
          <button type="button" class="btn btn-primary" tabindex="0">
            Read the spec
          </button>
        </a>
        <a href="https://github.com/dms-lang/dms">
          <button type="button" class="btn btn-secondary" tabindex="0">
            View on GitHub
          </button>
        </a>
      </div>
    </section>

    <section class="features" id="features">
      <h2>Why DMS?</h2>
      <p>
        Every format solves the readability problem differently. DMS bets
        on small grammar + explicit decoration sidecar + strong types.
      </p>
    </section>
  </main>
</body>
</html>

dms+html equivalent

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

# document comment — survives round-trip; dropped by browsers
+ |html(lang: "en")
  + |head
    + |meta(charset: "UTF-8")
    + |meta(name: "viewport", content: "width=device-width, initial-scale=1.0")
    + |meta(name: "description", content: "DMS makes structured data readable.")
    + |meta(name: "og:title",    content: "DMS — Document Markup System")
    + |link(rel: "stylesheet", href: "/assets/main.css")
    + |link(rel: "preload",    href: "/assets/hero.webp", as: "image")
    + |title "DMS — Document Markup System"
  + |body
    + |header(id: "site-header", class: "sticky top-0")
      + |nav
        + |a(href: "/", class: "logo") "DMS"
        + |ul(class: "nav-links")
          + |li
            + |a(href: "/spec.html") "Spec"
          + |li
            + |a(href: "/dialects.html") "Dialects"
          + |li
            + |a(href: "https://github.com/dms-lang/dms") "GitHub"
    + |main
      + |section(class: "hero")
        + |h1 "Structured data that stays readable"
        + |p(class: "subheading")
          + "DMS is a small, strict grammar for configuration, manifests,"
          + " and data interchange — with comments that survive round-trips."
        + |ul(class: "feature-list")
          # each li is a stable query target: grep '|li' finds all items
          + |li "Type-checked named params — no silent port: \"80\" coercions"
          + |li "Comments preserved on decode → mutate → re-emit"
          + |li "Grep-friendly: grep '|button' returns every button"
          + |li "Dialect system for HTML, HCL, KDL, RON, k8s, and more"
        + |div(class: "cta-group")
          + |a(href: "/spec.html")
            # tabindex: 0 is integer — the dialect type-checks it
            + |button(type: "button", class: "btn btn-primary", tabindex: 0)
              "Read the spec"
          + |a(href: "https://github.com/dms-lang/dms")
            + |button(type: "button", class: "btn btn-secondary", tabindex: 0)
              "View on GitHub"
      + |section(class: "features", id: "features")
        + |h2 "Why DMS?"
        + |p
          + "Every format solves the readability problem differently. DMS bets"
          + " on small grammar + explicit decoration sidecar + strong types."

What's visible in DMS that's lost or obscured in HTML

Comments round-trip. The # document comment at the top and the # each li is a stable query target inline comment survive encode → decode → re-emit unchanged. Browsers discard HTML comments from the DOM on parse; a prettier --write pass often drops them entirely. DMS's comment AST preserves them as first-class nodes.

Type-checked attributes. tabindex: 0 is an integer in the dialect spec (tabindex: { type: "integer" }). If you write tabindex: "zero" or tabindex: 0.5, the decoder rejects it. HTML serializes all attributes as strings; tabindex="0" and tabindex="yes" are both legal text — type errors are silent until the browser ignores or misinterprets the value.

Structured selector. Running grep '|button' on a .dms.html file finds every button element. The equivalent HTML grep — grep -E '<button' — also works, but grep '|button(.*tabindex' finds buttons with a tabindex attribute: a structured predicate on two decoration fields at once, which grep over raw HTML cannot do without either a parser or a fragile regex.

Dialect canonical spec

+++
_dms_tier: 0
+++

name:             "html"
version:          "1.0.0"
version_strategy: "caret"          # default; npm-style ^

families:
  + name:           "tag"
    default_sigils: ["|"]
    empty_default:  []
    content_slot:   "children"
    params:
      mode: "wildcard_with_typed"
      typed:
        # ── Global attributes (any HTML element) ─────────────
        id:               { type: "string" }
        class:            { type: "string" }
        lang:             { type: "string" }
        dir:              { type: "string" }
        title:            { type: "string" }
        style:            { type: "string" }
        tabindex:         { type: "integer" }
        hidden:           { type: "boolean" }
        accesskey:        { type: "string" }
        contenteditable:  { type: "string" }
        spellcheck:       { type: "boolean" }
        draggable:        { type: "boolean" }
        translate:        { type: "string" }
        role:             { type: "string" }

        # ── Common element-specific (string-typed) ──────────
        href:             { type: "string" }
        src:              { type: "string" }
        alt:              { type: "string" }
        type:             { type: "string" }
        name:             { type: "string" }
        value:            { type: "string" }
        placeholder:      { type: "string" }
        for:              { type: "string" }
        action:           { type: "string" }
        method:           { type: "string" }
        rel:              { type: "string" }
        target:           { type: "string" }
        charset:          { type: "string" }
        content:          { type: "string" }   # literal attribute on <meta>
        as:               { type: "string" }
        crossorigin:      { type: "string" }
        integrity:        { type: "string" }
        loading:          { type: "string" }
        decoding:         { type: "string" }
        referrerpolicy:   { type: "string" }

        # ── Boolean attributes ──────────────────────────────
        disabled:         { type: "boolean" }
        checked:          { type: "boolean" }
        required:         { type: "boolean" }
        readonly:         { type: "boolean" }
        multiple:         { type: "boolean" }
        selected:         { type: "boolean" }
        autofocus:        { type: "boolean" }
        autoplay:         { type: "boolean" }
        controls:         { type: "boolean" }
        loop:             { type: "boolean" }
        muted:            { type: "boolean" }
        defer:            { type: "boolean" }
        async:            { type: "boolean" }
        open:             { type: "boolean" }
        reversed:         { type: "boolean" }
        novalidate:       { type: "boolean" }

        # ── Numeric attributes ──────────────────────────────
        width:            { type: "integer" }
        height:           { type: "integer" }
        cols:             { type: "integer" }
        rows:             { type: "integer" }
        size:             { type: "integer" }
        maxlength:        { type: "integer" }
        minlength:        { type: "integer" }
        step:             { type: "integer" }
        min:              { type: "integer" }
        max:              { type: "integer" }
        span:             { type: "integer" }
        colspan:          { type: "integer" }
        rowspan:          { type: "integer" }
        start:            { type: "integer" }

The wildcard_with_typed mode means: the listed keys are type-checked, but any other key passes through unchecked. That covers:

Tier-1 ports decode any of these as plain string params; the runtime layer interprets them.

Content semantics

The tag family's content is a list of children. A child is:

The list's three equivalent source forms (mutually exclusive on any single element):

Block form (most common)

+ |p(class: "lede")
  + "Click "
  + |a(href: "/spec.html") "here"
  + " to read."

Each + item under the indented child block becomes an element of the children list.

Flow form

+ |p(class: "lede", children: ["Click ", |a(href: "/spec.html") "here", " to read."])

The children: param is hoisted into the value tree on decode. Useful for short inline runs.

Inline base_value form (single-child shorthand)

+ |title "DMS feature tour"
+ |h1 "Welcome to DMS"

Equivalent to |title(children: ["DMS feature tour"]) — the inline scalar is wrapped in a single-element children list.

When no form is used

+ |meta(charset: "UTF-8")
+ |br
+ |hr

No children. Value tree gets the family's empty_default ([]) — empty children list. This is the form for self-closing / void elements.

Conflict

Specifying content via two of these forms on the same element is a parse error (the tier-1 conflict rule applies). Pick one.

Encoder canonical form

The encoder emits each tag using a deterministic heuristic (inherited from TIER1.md §"Encoder canonical form"):

Tag inventory (representative)

dms+html validates parameters at the family level, not per-tag. The decoder accepts any tag name; per-tag rules (<input> requires type, <img> requires src and alt, etc.) live in the runtime / render layer, not in the DMS decoder.

Representative tags covered by this first cut:

Tag Content shape Notable attrs
html block container lang, dir
head block container (global only)
body block container class, id
title inline-text (global only)
meta self-closing charset, name, content, http-equiv
link self-closing rel, href, type, as
script text-content src, type, defer, async
style text-content media, type
h1h6 inline / mixed (global only)
p mixed content class, id
div mixed content (global)
span inline mixed (global)
a inline mixed href, target, rel
ul/ol list of li start, reversed
li mixed content value
img self-closing src, alt, width, height, loading
br/hr self-closing (global only)
form block container action, method, enctype
input self-closing type, name, value, placeholder
button inline mixed type, disabled
table block container (global)
tr/td/th block / mixed colspan, rowspan

This is not the full HTML5 element registry — it's a sampling to cover the common shapes (block container, mixed content, self-closing, text-content). Any HTML5 tag works regardless of whether it appears here, since the dialect doesn't validate per-tag.

Versioning

dms+html follows semver. The first published version is 1.0.0. The default match strategy is caret (^1.0.0 matches any 1.x.x ≥ 1.0.0).

What counts as a version bump:

File extension

dms+html documents use the .dms.html extension. The double extension lets editors and grep dispatch on the secondary extension without parsing front matter:

my-page.dms.html       # dms+html document
my-config.dms          # plain DMS (tier 0 or tier 1 with non-html dialects)

A dms+html document must declare _dms_tier: 1 and import the html dialect in _dms_imports regardless of file extension; the extension is a tooling hint, not a binding declaration.

Worked example

A complete dms+html document with the corresponding value tree and decoration sidecar is in TIER1.md §"Worked example: HTML in dms+html".

What's not in scope (this dialect)

Open questions for v0.2+