A config file format for humans.

DMS — Data Meta Syntax — aims to be a data syntax with YAML's clean look, TOML's small strict spec, and one extra superpower: comments survive parse → modify → re‑emit, in every reference parser, in every language.

+++
title:     "DMS feature tour"
version:   "1.0.0"
updated:   2026-04-24T09:30:00-04:00
+++

# Line comments take # or //, your pick.
// Bare keys allow full Unicode. Heredocs are first‑class.

database:
  host:     "db.internal"
  port:     5432                  # raised after the LB change
  pool:     { size: 10, idle_timeout_s: 30 }

servers:
  + name: "web1"
    disks:
      + mount: "/"
        size_gb: 100
      + mount: "/var"
        size_gb: 500
  + name: "web2"

sql: """SQL _trim("\n", ">")
    SELECT id, email
      FROM users
     WHERE active = true
    SQL

regions: ["us-east-1", "eu-west-1", "ap-south-1"]

Indent-based, one rule

Siblings under a parent must share one indent width. Tabs are banned in structural indent. That is the entire indent specification.

Distinct types, never inferred

Quoted is a string. Bare digits are a number. true is a boolean. There is no NO-becomes-false, no octal-from-leading-zero, no Norway problem.

Comments are first-class

Leading, trailing, and floating comments are AST nodes. They survive parse, mutation, and emit — by spec, in every reference parser.

Polymorphic root

A document can be a table, a list, a scalar, or empty. Want to serialize a list? Just write a list — no wrapper key, no ceremony.

Heredocs with modifiers

Labelled heredocs in basic and literal flavors, with chainable modifiers like _trim and _fold_paragraphs. Every YAML block-scalar mode is one combo away.

Thirteen first-party parsers

Rust, C, Go, Zig, Python, Perl, JavaScript, C#, Ruby, Java, PHP, Lua, Crystal. All hold at 4695 / 4695 on the shared conformance corpus.

Why a new config format?

Same data, three formats. The differences aren't cosmetic.

DMS
servers:
  + name: "web1"
    disks:
      + mount: "/"
      + mount: "/var"
  + name: "web2"

countries:  ["US", "UK", "NO"]
version:    "1.0"            # quoted = string
port:       11               # bare digits = decimal
octal_perms: 0o644            # explicit prefix

YAML's shape. TOML's strictness. + for list items so paths never repeat. One indent rule, distinct types, a small spec you can read over lunch.

YAML
servers:
  - name: web1
    disks:
      - mount: /
      - mount: /var
  - name: web2

countries:
  - US
  - UK
  - NO          # silently false
version: 1.0     # float, not "1.0"
port:    011     # octal 9 in YAML 1.1

Clean shape. Eighty-five-page spec, plain scalars that type-coerce on what they look like, anchors and merge keys baked into the data model.

TOML
[[servers]]
name = "web1"

[[servers.disks]]
mount = "/"

[[servers.disks]]
mount = "/var"

[[servers]]
name = "web2"

Tight grammar, unambiguous types. Then you nest something and the path repeats five times. No block comments, no list-or-scalar root, no heredocs.

A quick tour

Every feature DMS has, in one short scroll.

Comments — five forms, mix freely

# line comment, hash style
// line comment, slash style — both equivalent

port: 8080   # trailing on the same line

/* C-style block — inline or multi-line.
   /* Nests, so commenting out commented code just works. */
   Closes on the matching */

host: /* inline, between key and value */ "db.internal"

###
  Hash-block, unlabeled. Closes on a bare ### line.
  ###

###NOTE
  Hash-block, labeled. Closes on a line equal to NOTE.
  Pick any label when the body contains stray */ or ###.
  NOTE

Strings — basic, literal, heredoc

basic:    "escapes are processed: \n \t é"
literal:  'C:\Users\ada — backslashes stay backslashes'

# Strip trailing newlines (YAML's |-):
sql: """SQL _trim("\n", ">")
    SELECT id, email
      FROM users
     WHERE active = true
    SQL

# Fold paragraphs (YAML's >+):
prose: """DOC _fold_paragraphs()
    The quick brown fox
    jumps over the lazy dog.

    Sphinx of black quartz,
    judge my vow.
    DOC

# Literal heredoc — no escape processing:
regex: '''RE
    ^[A-Za-z_][A-Za-z0-9_]*$
    RE

Numbers — base prefixes, no inference

port:       8080            # always decimal integer
mask:       0xFF_00_00      # hex with digit underscores
perms:      0o644           # octal
flags:      0b1010_0101     # binary
ratio:      0.42            # decimal float
hex_float:  0x1.8p3         # hex float (= 12.0)
sentinel:   nan             # also: inf, -inf

ready:      true
shutdown:   false           # not "no", not "off"

Dates & times — first-class

deployed: 2026-04-24T09:30:00-04:00   # offset datetime
window:   2026-04-24T09:30:00         # local datetime
release:  2026-04-24                  # local date
cutover:  09:30:00                    # local time

Lists & tables — block and flow

# block list with "+" — one character, never doubled brackets
servers:
  + name: "web1"
    port: 443
  + name: "web2"
    port: 443

# flow list (inline)
regions: ["us-east-1", "eu-west-1", "ap-south-1"]

# block table (indent)
database:
  host: "db.internal"
  port: 5432

# flow table (inline)
cache: { host: "redis", db: 0 }

Front matter — for document metadata

+++
app_name:     "myservice"
doc_version:  "1.2.3"
updated:      2026-04-23
+++

# the actual document body starts here
database:
  host: "db.internal"
  port: 5432

Polymorphic root — list, table, scalar, or empty

# all three are valid DMS documents:

# 1. a table
title: "production"

# 2. a bare list
+ "apples"
+ "oranges"

# 3. a single scalar
42

Unicode keys — bare, no quoting required

résumé:            "ada.pdf"
こんにちは:         "hello"
"path with space": "/etc/dms.conf"   # quote only the unusual

Comments survive round-trip

Configuration files are documentation. A comment on port: 8080 # raised after the LB change in 2024-Q4 exists so someone in 2026 can read it. If the first formatter or deploy template renderer drops it, the documentation was a lie.

DMS makes comments first-class AST nodes attached to the value tree at one of four positions. encode(decode(source)) walks the tree and writes them back where they belong. Modify the data — rename a key, sort a list, delete a server — and the comments on the still-present nodes travel with them. Round-trip is byte-stable on the second pass.

Leading
# raised after the LB change
port: 8080

The line above a node, no blank line between. Stacks if you write more than one.

Trailing
port: 8080   # raised after the LB change

Same line as the value, after it. A line comment ends the line; /* */ blocks can stack inline.

Inner
port: /* TODO: read from env */ 8080

Between a key's : and its value. /* … */ only — line comments would eat the value.

Floating
database:
  host: "db.internal"
  port: 5432

  # pool tuning lives in deploy/db.dms

Block-final note, separated from siblings by a blank line. Stays attached to the parent, not the last child.

Format Comments survive parse → modify → re-emit?
DMSYes — every reference parser, by spec
JSONNo — no comments in spec
JSON5No — every library drops them
YAMLruamel.yaml only (Python), opt-in, slower
TOMLtoml-edit crate only (Rust), separate value type

Fast, too

DMS was designed for readability, but the parsers turn out to be quick. On a real production Helm chart's values.yaml (kube-prometheus-stack 84.3.0, 4–5k kvpairs, ~25 KB) DMS beats YAML in every meaningful cell, decode and encode. Measured in lite mode.

Decode (parse only)

Each driver reads stdin, parses to an in-memory tree, prints ok\n. Cells are median wall time across 15 timed iterations after 2 warmup — fresh process per iter, includes startup. Lower is better.

Language DMS YAML TOML JSON
C 5.62 ms 6.61 ms 5.64 ms 5.66 ms
Zig 5.55 ms 5.05 ms 4.46 ms 4.58 ms
Rust 7.27 ms 7.52 ms 6.89 ms 6.27 ms
Go 8.34 ms 9.62 ms 9.27 ms 8.19 ms
Crystal 12 ms 13 ms 370 ms ✱12 ms
Lua 16 ms 21 ms 26 ms 8.6 ms
Perl (DMS-XS) 24 ms 27 ms 177 ms 27 ms
Perl (pure) 33 ms
C# 44 ms 79 ms 61 ms 45 ms
Python (dms_c) 45 ms 65 ms 49 ms 44 ms
Node 49 ms 65 ms 57 ms 42 ms
Python (pure) 56 ms
PHP 63 ms 62 ms 78 ms 47 ms
Java 125 ms 188 ms 296 ms 237 ms
Ruby 132 ms 154 ms 197 ms 121 ms

The C and Zig DMS rows shed parse cost via an ASCII fast-path on the source-level NFC normalization: utf8proc's utf8proc_map walks every byte and allocates a fresh buffer regardless of input (~420 µs on this 25 KB fixture), but when the source is pure ASCII — the common case for config files — NFC is a no-op. A single byte scan catches that case and skips the heavy call. †† Python/JSON: stdlib json parsed inside the startup-probe noise floor on this fixture — best-of-N parse time landed under the probe minimum, so the subtraction clamps to ~0.  Crystal/TOML: manastech/crystal-toml has a wide-table pathology — 26 KB takes 351 ms. The DMS port in that same language doesn't have the issue. The Python (pure) and Perl (pure) rows show DMS only — pure-language peers for the other formats either don't exist meaningfully (stdlib JSON is C; PyYAML defaults to libyaml) or are the same parser everyone uses (tomli is pure, but it's the tomli). The split exists to show what the C extension buys you on top of the pure-language parser. The gap is modest now — Python ~1.2× (dms_c over pure-Python), Perl ~1.4× (DMS-XS over pure-Perl on this fixture). Pure-Perl closed most of the previous gap via a mega-regex fast path for lite-mode parsing that bypasses the recursive-descent walker; algorithmic parse cost is now ~5 ms in-process, beating pure-Python's ~8 ms.

Encode (in-memory → text)

Each driver parses stdin once (untimed warmup), then loops the serialise step. Cells are median wall time of the timed loop (20 iters, 3 warmup). DMS encode = encode_lite — canonical-form output with no comment / original-form preservation.

Language DMS YAML TOML JSON
C 0.07 ms 1.0 ms ‡n/a ‡ 0.05 ms
Perl (DMS-XS) 0.08 ms 1.1 ms §10 ms ¶2.2 ms
Node 0.22 ms 0.99 ms 0.87 ms 0.08 ms
Crystal 0.24 ms 0.53 ms 0.12 ms 0.08 ms
Rust 0.32 ms 0.42 ms 0.35 ms 0.02 ms
Zig 0.46 ms 0.02 ms ω 0.03 ms 0.03 ms
PHP 0.55 ms 2.6 ms 5.9 ms 0.04 ms
C# 0.56 ms 7.9 ms 1.2 ms 0.15 ms
Python 0.70 ms 26 ms 1.4 ms 0.11 ms
Ruby 0.82 ms 4.3 ms 7.3 ms 0.13 ms
Go 1.0 ms 1.6 ms 0.53 ms ε
Lua 1.0 ms 6.0 ms ε ε
Java 1.3 ms 5.4 ms 1.1 ms 0.22 ms
Perl (pure) 2.2 ms 25 ms §10 ms ¶2.2 ms

 C: no tomlc99 emit driver wired into bench_encoders.py yet (TOML parse-only is wired). C/YAML emit goes through yaml_emitter_dump, which libyaml ties to a yaml_parser_load callback — the timed loop pays parse + emit each iter, not pure emit. The number reflects what an application actually pays per serialise, but is not directly comparable to the other rows' "already-loaded structure → text" measurements. § Perl YAML splits on the library: YAML.pm (pure) vs. YAML::XS (XS).  Perl TOML: only TOML::Tiny is widely used; the same encoder backs both Perl rows. ε Cells reading 0 ms are broken-driver sentinels — the encoder loop short-circuited rather than serializing. ω Zig/YAML emit at 22 µs is implausibly fast on a 28 KB output; the zig-yaml emitter likely skips work the others don't.

The companion fixtures are normalized per format so every parser actually parses, since real-world data exposed bugs in five of the comparison libraries: kubkon/zig-yaml errors on key: {} followed by a sibling block-mapping opener and on plain scalars containing a double quote; sam701/zig-toml errors on consecutive empty inline arrays; lua-toml rejects [[a.b.c]] array-of-tables headers when an intermediate parent has been declared; yosymfony/toml rejects inline-tables-inside-arrays; and manastech/crystal-toml decode time still scales pathologically with table width even on the small fixture (26 KB takes 351 ms). gen_companions.py drops the offending shapes per format so the bench cells return numbers rather than errors. The DMS port in each of these languages handles every shape unaltered.

Lite mode

Every DMS port ships a lite mode as of 0.2.0 (SPEC §Parsing modes — full and lite). Same parser, same grammar, same errors, same data tree — but with comment-AST construction and original_forms recording switched off. For read-only consumers (config loaders, deploy pipelines, CI scripts) that never call encode, that machinery is dead weight; lite mode skips it.

The chart above is lite mode. Switch to full mode and the parser earns its keep: it returns a tree where every value carries its source formatting and attached comments, ready to round-trip via encode. Same fixture, same comment density — but now full pays the comment-attachment cost and lite skips it, so the delta is exactly the preservation tax:

Language Full Lite Lite is …
Rust 9.91 ms 9.36 ms 6% faster
Go 10.31 ms 9.62 ms 7% faster
C 10.79 ms 10.27 ms 5% faster
Zig 11.75 ms 10.89 ms 7% faster
Crystal 15.10 ms 14.14 ms 6% faster
Lua 51.52 ms 38.31 ms 26% faster
C# 60.61 ms 53.29 ms 12% faster

Median wall time, includes startup. Same machine, same fixture, same parser per row — only the parse mode changes. The 5–7 % wall-time wins on the fast row look small but read very differently once you subtract startup: parse-only, Rust full→lite is 3.76 ms → 2.04 ms (46 %), Crystal is 2.67 → 1.39 (48 %), Go is 2.04 → 1.53 (25 %) — the comment-AST and original_forms bookkeeping is a genuinely non-trivial slice of pure parse cost in languages that finish in single-digit milliseconds. On the slower rows (Python / PHP / Java / Ruby) the delta gets buried under interpreter startup variance, so they're omitted here rather than dressed up. Lite mode is opt-in per call: parse_lite_document(src) / parseLiteDocument(src) / decode_document(src, mode="lite") depending on the port's idiom. Full mode remains the default and the conformance floor.

Tier 1 — decorators and dialects

Structured annotations on top of the tier-0 value tree.

Tier 1 layers structured decorators on top of the tier-0 grammar. A decorator call is a sigil-prefixed function-shaped annotation (|tag(class: "lede"), 🚀deploy(env: "prod")) attached to any value-tree node. The decoration sidecar is parallel to the value tree — consumers walk one or both as they need.

Dialects bind sigils to families of decorators. dms+html carries HTML-shaped documents; dms+hcl carries HCL/Terraform configs; dms+kdl node-shaped data; dms+ron tagged ADTs; dms+k8s Kubernetes manifests with auto-mirrored labels and selectors. Reserved-emoji codepoints (🚀, 🇺🇸, 🏷️, 🔒, …) are first-class sigil atoms alongside ASCII.

Ready?

The spec is short, by design.