dms+html — DMS dialect for HTML
Brand: dms+html
File extension: .dms.html
Spec version: 0.1 (draft)
DMS tier required: 1 (_dms_tier: 1 in front matter)
Parent spec: TIER1.md
Tier-0 base: SPEC.md
Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.
What this is
dms+html is a tier-1 dialect that lets you write HTML-shaped
documents in DMS. Each |tag(...) call produces a single HTML
element; the value tree carries the element's children, the
decoration sidecar carries its tag name and attributes.
Why a dialect rather than direct HTML? The promise of DMS
tier 1 — element-shaped data without burying the values — applies
here. Compared to HTML/XML, dms+html keeps text content in the
value-column, separates structural metadata from content,
preserves comments through round-trip, and inherits tier-0's
small-and-strict grammar. Compared to JSON-modeled HTML (the
{tag: "p", attrs: {...}, children: [...]} AST shape), it stays
indent-readable and grep-friendly.
dms+html is not a serialization format for HTML — it's a
DMS-shaped representation of HTML-like documents. A separate
runtime / render layer converts a decoded Document_t1 to an
HTML byte stream (and back) when an actual .html file is
needed.
Quick comparison
<html lang="en">
<head>
<title>DMS feature tour</title>
<meta charset="UTF-8">
</head>
<body class="main">
<h1>Welcome to DMS</h1>
<p class="lede">
Click <a href="/spec.html">here</a> to read the spec.
</p>
</body>
</html>
In dms+html:
+++
_dms_tier: 1
_dms_imports:
+ dialect: "html"
version: "1.0.0"
+++
+ |html(lang: "en")
+ |head
+ |title "DMS feature tour"
+ |meta(charset: "UTF-8")
+ |body(class: "main")
+ |h1 "Welcome to DMS"
+ |p(class: "lede")
+ "Click "
+ |a(href: "/spec.html") "here"
+ " to read the spec."
Realistic example
A landing-page hero section — the kind of snippet every front-end project has. Native HTML first, then the dms+html equivalent, then a note on what the DMS representation gains.
Native HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="DMS makes structured data readable.">
<meta name="og:title" content="DMS — Document Markup System">
<link rel="stylesheet" href="/assets/main.css">
<link rel="preload" href="/assets/hero.webp" as="image">
<title>DMS — Document Markup System</title>
</head>
<body>
<header id="site-header" class="sticky top-0">
<nav>
<a href="/" class="logo">DMS</a>
<ul class="nav-links">
<li><a href="/spec.html">Spec</a></li>
<li><a href="/dialects.html">Dialects</a></li>
<li><a href="https://github.com/dms-lang/dms">GitHub</a></li>
</ul>
</nav>
</header>
<main>
<section class="hero">
<h1>Structured data that stays readable</h1>
<p class="subheading">
DMS is a small, strict grammar for configuration, manifests,
and data interchange — with comments that survive round-trips.
</p>
<ul class="feature-list">
<li>Type-checked named params — no silent <code>port: "80"</code> coercions</li>
<li>Comments preserved on decode → mutate → re-emit</li>
<li>Grep-friendly: <code>grep '|button'</code> returns every button</li>
<li>Dialect system for HTML, HCL, KDL, RON, k8s, and more</li>
</ul>
<div class="cta-group">
<a href="/spec.html">
<button type="button" class="btn btn-primary" tabindex="0">
Read the spec
</button>
</a>
<a href="https://github.com/dms-lang/dms">
<button type="button" class="btn btn-secondary" tabindex="0">
View on GitHub
</button>
</a>
</div>
</section>
<section class="features" id="features">
<h2>Why DMS?</h2>
<p>
Every format solves the readability problem differently. DMS bets
on small grammar + explicit decoration sidecar + strong types.
</p>
</section>
</main>
</body>
</html>
dms+html equivalent
+++
_dms_tier: 1
_dms_imports:
+ dialect: "html"
version: "1.0.0"
+++
# document comment — survives round-trip; dropped by browsers
+ |html(lang: "en")
+ |head
+ |meta(charset: "UTF-8")
+ |meta(name: "viewport", content: "width=device-width, initial-scale=1.0")
+ |meta(name: "description", content: "DMS makes structured data readable.")
+ |meta(name: "og:title", content: "DMS — Document Markup System")
+ |link(rel: "stylesheet", href: "/assets/main.css")
+ |link(rel: "preload", href: "/assets/hero.webp", as: "image")
+ |title "DMS — Document Markup System"
+ |body
+ |header(id: "site-header", class: "sticky top-0")
+ |nav
+ |a(href: "/", class: "logo") "DMS"
+ |ul(class: "nav-links")
+ |li
+ |a(href: "/spec.html") "Spec"
+ |li
+ |a(href: "/dialects.html") "Dialects"
+ |li
+ |a(href: "https://github.com/dms-lang/dms") "GitHub"
+ |main
+ |section(class: "hero")
+ |h1 "Structured data that stays readable"
+ |p(class: "subheading")
+ "DMS is a small, strict grammar for configuration, manifests,"
+ " and data interchange — with comments that survive round-trips."
+ |ul(class: "feature-list")
# each li is a stable query target: grep '|li' finds all items
+ |li "Type-checked named params — no silent port: \"80\" coercions"
+ |li "Comments preserved on decode → mutate → re-emit"
+ |li "Grep-friendly: grep '|button' returns every button"
+ |li "Dialect system for HTML, HCL, KDL, RON, k8s, and more"
+ |div(class: "cta-group")
+ |a(href: "/spec.html")
# tabindex: 0 is integer — the dialect type-checks it
+ |button(type: "button", class: "btn btn-primary", tabindex: 0)
"Read the spec"
+ |a(href: "https://github.com/dms-lang/dms")
+ |button(type: "button", class: "btn btn-secondary", tabindex: 0)
"View on GitHub"
+ |section(class: "features", id: "features")
+ |h2 "Why DMS?"
+ |p
+ "Every format solves the readability problem differently. DMS bets"
+ " on small grammar + explicit decoration sidecar + strong types."
What's visible in DMS that's lost or obscured in HTML
Comments round-trip. The # document comment at the top and the
# each li is a stable query target inline comment survive encode → decode
→ re-emit unchanged. Browsers discard HTML comments from the DOM on parse;
a prettier --write pass often drops them entirely. DMS's comment AST
preserves them as first-class nodes.
Type-checked attributes. tabindex: 0 is an integer in the dialect
spec (tabindex: { type: "integer" }). If you write tabindex: "zero" or
tabindex: 0.5, the decoder rejects it. HTML serializes all attributes as
strings; tabindex="0" and tabindex="yes" are both legal text — type
errors are silent until the browser ignores or misinterprets the value.
Structured selector. Running grep '|button' on a .dms.html file
finds every button element. The equivalent HTML grep — grep -E '<button'
— also works, but grep '|button(.*tabindex' finds buttons with a
tabindex attribute: a structured predicate on two decoration fields at
once, which grep over raw HTML cannot do without either a parser or a
fragile regex.
Dialect canonical spec
+++
_dms_tier: 0
+++
name: "html"
version: "1.0.0"
version_strategy: "caret" # default; npm-style ^
families:
+ name: "tag"
default_sigils: ["|"]
empty_default: []
content_slot: "children"
params:
mode: "wildcard_with_typed"
typed:
# ── Global attributes (any HTML element) ─────────────
id: { type: "string" }
class: { type: "string" }
lang: { type: "string" }
dir: { type: "string" }
title: { type: "string" }
style: { type: "string" }
tabindex: { type: "integer" }
hidden: { type: "boolean" }
accesskey: { type: "string" }
contenteditable: { type: "string" }
spellcheck: { type: "boolean" }
draggable: { type: "boolean" }
translate: { type: "string" }
role: { type: "string" }
# ── Common element-specific (string-typed) ──────────
href: { type: "string" }
src: { type: "string" }
alt: { type: "string" }
type: { type: "string" }
name: { type: "string" }
value: { type: "string" }
placeholder: { type: "string" }
for: { type: "string" }
action: { type: "string" }
method: { type: "string" }
rel: { type: "string" }
target: { type: "string" }
charset: { type: "string" }
content: { type: "string" } # literal attribute on <meta>
as: { type: "string" }
crossorigin: { type: "string" }
integrity: { type: "string" }
loading: { type: "string" }
decoding: { type: "string" }
referrerpolicy: { type: "string" }
# ── Boolean attributes ──────────────────────────────
disabled: { type: "boolean" }
checked: { type: "boolean" }
required: { type: "boolean" }
readonly: { type: "boolean" }
multiple: { type: "boolean" }
selected: { type: "boolean" }
autofocus: { type: "boolean" }
autoplay: { type: "boolean" }
controls: { type: "boolean" }
loop: { type: "boolean" }
muted: { type: "boolean" }
defer: { type: "boolean" }
async: { type: "boolean" }
open: { type: "boolean" }
reversed: { type: "boolean" }
novalidate: { type: "boolean" }
# ── Numeric attributes ──────────────────────────────
width: { type: "integer" }
height: { type: "integer" }
cols: { type: "integer" }
rows: { type: "integer" }
size: { type: "integer" }
maxlength: { type: "integer" }
minlength: { type: "integer" }
step: { type: "integer" }
min: { type: "integer" }
max: { type: "integer" }
span: { type: "integer" }
colspan: { type: "integer" }
rowspan: { type: "integer" }
start: { type: "integer" }
The wildcard_with_typed mode means: the listed keys are
type-checked, but any other key passes through unchecked.
That covers:
data-*attributes (custom data)aria-*attributes (accessibility — ~50 attrs, all strings, not worth enumerating)on*event handlers (onclick,onload, etc.)- Framework-specific attributes (
x-*for Alpine,wire:*for Livewire, etc.) - New HTML attributes added in future revisions
Tier-1 ports decode any of these as plain string params; the runtime layer interprets them.
Content semantics
The tag family's content is a list of children. A child is:
- A scalar (string, number, boolean, datetime) — text content.
- A nested decorated value — another
|tagcall, which becomes a child element.
The list's three equivalent source forms (mutually exclusive on any single element):
Block form (most common)
+ |p(class: "lede")
+ "Click "
+ |a(href: "/spec.html") "here"
+ " to read."
Each + item under the indented child block becomes an element
of the children list.
Flow form
+ |p(class: "lede", children: ["Click ", |a(href: "/spec.html") "here", " to read."])
The children: param is hoisted into the value tree on decode.
Useful for short inline runs.
Inline base_value form (single-child shorthand)
+ |title "DMS feature tour"
+ |h1 "Welcome to DMS"
Equivalent to |title(children: ["DMS feature tour"]) — the
inline scalar is wrapped in a single-element children list.
When no form is used
+ |meta(charset: "UTF-8")
+ |br
+ |hr
No children. Value tree gets the family's empty_default ([])
— empty children list. This is the form for self-closing /
void elements.
Conflict
Specifying content via two of these forms on the same element is a parse error (the tier-1 conflict rule applies). Pick one.
Encoder canonical form
The encoder emits each tag using a deterministic heuristic (inherited from TIER1.md §"Encoder canonical form"):
- Inline base_value form when children is a single scalar
(
|h1 "...",|li "..."). - Block form when children has > 1 element OR any element is a nested non-scalar.
- Flow
children:form when all elements are scalars / decorated scalars and the line fits the width threshold (typical mixed-text-with-spans paragraphs). - No content emission when children matches the empty
default
[](self-closing / void elements).
Tag inventory (representative)
dms+html validates parameters at the family level, not
per-tag. The decoder accepts any tag name; per-tag rules
(<input> requires type, <img> requires src and alt,
etc.) live in the runtime / render layer, not in the DMS
decoder.
Representative tags covered by this first cut:
| Tag | Content shape | Notable attrs |
|---|---|---|
html |
block container | lang, dir |
head |
block container | (global only) |
body |
block container | class, id |
title |
inline-text | (global only) |
meta |
self-closing | charset, name, content, http-equiv |
link |
self-closing | rel, href, type, as |
script |
text-content | src, type, defer, async |
style |
text-content | media, type |
h1–h6 |
inline / mixed | (global only) |
p |
mixed content | class, id |
div |
mixed content | (global) |
span |
inline mixed | (global) |
a |
inline mixed | href, target, rel |
ul/ol |
list of li |
start, reversed |
li |
mixed content | value |
img |
self-closing | src, alt, width, height, loading |
br/hr |
self-closing | (global only) |
form |
block container | action, method, enctype |
input |
self-closing | type, name, value, placeholder |
button |
inline mixed | type, disabled |
table |
block container | (global) |
tr/td/th |
block / mixed | colspan, rowspan |
This is not the full HTML5 element registry — it's a sampling to cover the common shapes (block container, mixed content, self-closing, text-content). Any HTML5 tag works regardless of whether it appears here, since the dialect doesn't validate per-tag.
Versioning
dms+html follows semver. The first published version is 1.0.0.
The default match strategy is caret (^1.0.0 matches any
1.x.x ≥ 1.0.0).
What counts as a version bump:
- Breaking (major). Changing the content_slot name; renaming
a typed attribute; changing a typed attribute's type;
switching modes from
wildcard_with_typedtostrict; removing a typed attribute (since previously-conformant docs may rely on its typing). - Additive (minor). Adding a new typed attribute. Because
mode is
wildcard_with_typed, unknown attrs were previously passing as wildcards; adding them as typed only tightens validation, but a doc that wrote a wrong-typed value would now fail. Considered a minor bump under semver because it's an added constraint, not a removed feature. - Bug fix (patch). Documentation clarifications, correcting a typo in a typed attribute's name (with deprecation), fixing inconsistencies in the dialect spec text.
File extension
dms+html documents use the .dms.html extension. The double
extension lets editors and grep dispatch on the secondary
extension without parsing front matter:
my-page.dms.html # dms+html document
my-config.dms # plain DMS (tier 0 or tier 1 with non-html dialects)
A dms+html document must declare _dms_tier: 1 and import
the html dialect in _dms_imports regardless of file
extension; the extension is a tooling hint, not a binding
declaration.
Worked example
A complete dms+html document with the corresponding value tree and decoration sidecar is in TIER1.md §"Worked example: HTML in dms+html".
What's not in scope (this dialect)
- Rendering / serialization to HTML byte stream. That's the
runtime / render layer. Producing an
.htmlfile from a decodedDocument_t1is a separate tool / library. - Per-tag validation. "input requires
type", "img requiressrcandalt", "table cells must be insidetr", etc. — runtime layer concerns. The DMS decoder accepts any tag with any attrs. - Custom elements / web components. Tag names with
-(e.g.,my-component) work because-is a valid character in tier-0 bare keys, but the dialect does no special handling — they're just tags. - SVG, MathML, foreign content. Each gets its own dialect
if/when a real need surfaces (
dms+svg,dms+mathml). - Doctype. dms+html documents don't carry a doctype; the
render layer prepends
<!DOCTYPE html>when emitting HTML5. - Script / style content escaping. The dialect treats
scriptandstyleelement content as plain strings; the render layer is responsible for any CDATA-style handling needed for HTML serialization.
Open questions for v0.2+
- Per-tag empty defaults. Some elements are pure
self-closing (
<meta>,<link>,<br>,<hr>) and have no meaningful "empty default of[]" — they're always empty. Currently the family-levelempty_default: []covers them, but a future revision could add per-tag overrides if a render layer wants tighter signaling. - Boolean attribute serialization. HTML serializes boolean
trueas the bare attribute name (<input disabled>) andfalseby omission. The dialect carries booleans as plain values; this convention belongs in the render layer. - Attribute order preservation. Tier-1's path-keyed sidecar preserves source order on decode; encoder canonical form re-emits in source order. HTML serialization may want a different order (e.g., alphabetical) — render layer concern.
- Inline expressions.
${var}interpolation, JSX-style{expr}, server-side templating syntax — none of these are in dms+html. Adms+jsxordms+templatingdialect could add them.