How Markdoc works

4.1 From Stripe's documentation problem to an open-source solution

Markdoc was built by Stripe (the payments technology company) to power their public documentation at stripe.com/docs. Before Markdoc, Stripe's documentation ran on a monolithic system where content freely mixed HTML, Markdown, Ruby code, and templating logic. As Ryan Paul of Stripe's documentation team wrote in 2022: "Content authoring effectively became software development, and with that became subject to the same technical complexity and overhead."

Stripe needed documentation that was easy for writers to author, rich enough for interactive components, and, crucially, machine-readable from top to bottom. They wanted to treat their documentation as data, not code. In May 2022, Stripe open-sourced Markdoc under an MIT license, making it available to any organization facing similar challenges.

The philosophy Stripe articulated is the foundation of everything that follows. "Docs as data, not docs as code." Where other systems embed programming logic inside documents (making them powerful but fragile and opaque), Markdoc keeps content purely declarative, structured, predictable, and fully serializable as data.

4.2 The three-phase pipeline: parse, transform, render

Markdoc processes content through three distinct phases, each producing a clear intermediate output. Understanding this pipeline is the key to understanding why Markdoc is different from simply writing in Word or Markdown.

Phase 1: Parse. The raw Markdoc text (which is standard Markdown with some extensions) is read and converted into an Abstract Syntax Tree. This is the structural blueprint. Every heading, paragraph, list, custom tag, and piece of metadata becomes a node in the tree. The parser identifies what each piece of content is and how it relates to everything around it. Think of this as the surveyor's inspection, cataloguing every element and recording its position in the hierarchy.

Phase 2: Transform. The AST is processed according to a set of rules (specifically, schemas that define what tags are allowed, what attributes they accept, and how they should behave). Variables are resolved, custom tags are expanded, and validation checks run. The output is a "renderable tree," a cleaned-up, validated version of the AST ready for final output. This is like the plan approval process, confirming that every element meets the required specifications before the vessel is declared fit.

Phase 3: Render. The renderable tree is converted into its final output format. HTML for a website, React components for an interactive application, or any other format a custom renderer supports. The same renderable tree can produce multiple outputs simultaneously. This is the delivery. The same approved design can be built as a physical vessel, rendered as a 3D model, or printed as construction drawings.

In code, the entire pipeline is remarkably concise:

const ast = Markdoc.parse(document);       // Text → AST
const content = Markdoc.transform(ast);     // AST → Renderable Tree  
const html = Markdoc.renderers.html(content); // Renderable Tree → HTML

Between the parse and transform phases, a validate function can check the AST against schema rules and return errors with specific locations, enabling automated quality checks before any content is published.

4.3 What makes Markdoc different from plain Markdown and from MDX

Plain Markdown is familiar and readable, but it is flat. It provides no mechanism for custom components, variables, conditional content, content reuse, or schema validation. You cannot define a "callout box" or a "data table with specific required columns" in plain Markdown. You cannot validate that a document contains all required sections.

MDX solves the extensibility problem by embedding JavaScript and React components directly inside Markdown. This is powerful but comes with significant costs. Content authors must understand programming concepts, documents can execute arbitrary code (a security concern), and the resulting AST is tied to a JavaScript runtime, meaning it cannot be easily serialized, stored, or queried as pure data. MDX treats docs as code.

Markdoc takes a different path. It extends Markdown with a tag syntax ({% tagname %}) that is purely declarative. Tags describe what content should appear, not how to compute it. No arbitrary code runs inside the document. The AST is fully serializable to JSON; it can be stored in a database, sent across a network, cached, queried, or manipulated by any programming language. Markdoc treats docs as data.

This distinction might seem subtle, but it is the difference between a document that is a transparent, inspectable data structure and a document that is an executable program. For regulated industries where content must be validated, audited, and traced, the data approach is fundamentally safer and more powerful.

How Markdoc works

4 How Markdoc works

4.1 From Stripe's documentation problem to an open-source solution

4.2 The three-phase pipeline: parse, transform, render

4.3 What makes Markdoc different from plain Markdown and from MDX