What is an Abstract Syntax Tree?

3.1 The concept in plain language

An Abstract Syntax Tree (AST for short) is a way of representing the structure and meaning of a document as a tree-shaped diagram that a computer can understand. If you have ever looked at a table of contents, you have already seen a tree structure. The book title sits at the top, chapters branch beneath it, sections nest within chapters, and paragraphs within sections. An AST is essentially a machine-readable table of contents that also carries all the actual content attached to each branch.

The word "abstract" simply means the tree captures structure and meaning rather than raw formatting details. It strips away noise (extra whitespace, punctuation marks, formatting codes) and preserves only the meaningful relationships between parts of the document. The word "syntax" refers to the rules governing how the document is organized. And "tree" describes the shape, with a single root at the top branching into increasingly specific children below.

3.2 A maritime analogy: the ship's General Arrangement plan

Consider a vessel's General Arrangement (GA) plan. At the highest level, you have the vessel itself. Beneath that, individual decks. Each deck contains compartments (the engine room, the bridge, cargo holds, accommodation spaces). Each compartment contains equipment, systems, and fittings. This is a tree. One root (the vessel) branches into decks, which branch into compartments, which branch into equipment.

Now imagine this GA plan is not a drawing pinned to a bulkhead but a structured data file that a computer can read. You could ask the computer to show you every compartment on Deck 3, to list all fire suppression equipment across all decks, or to find which compartments contain equipment that was last inspected more than twelve months ago. These queries are impossible with a static drawing. They become trivial when the structure is represented as data, as a tree.

An AST does for documents what a structured GA plan does for a vessel. It makes every heading, paragraph, list, table, and cross-reference individually addressable and queryable.

3.3 How flat text becomes a tree

Take a simple maritime procedure written as plain text:

Pre-Departure Checklist

Navigation
  Charts updated
  GPS operational
  Voyage plan filed

Safety Equipment
  Life rafts inspected
  Fire extinguishers checked
  EPIRB tested

To a human reader, the structure is obvious. There is a title, two categories, and items within each category. But to a computer reading this as a flat string of characters, it is just a sequence of letters and line breaks. Parsing this text into an AST produces something like:

Document
├── Heading: "Pre-Departure Checklist"
├── Section: "Navigation"
│   ├── Item: "Charts updated"
│   ├── Item: "GPS operational"
│   └── Item: "Voyage plan filed"
└── Section: "Safety Equipment"
    ├── Item: "Life rafts inspected"
    ├── Item: "Fire extinguishers checked"
    └── Item: "EPIRB tested"

Now the computer knows that "Charts updated" belongs to "Navigation," which belongs to the "Pre-Departure Checklist." It can count items per section, check completeness, or render the same content as a web page, a PDF, or a mobile checklist. All from the same underlying tree.

3.4 The vocabulary of trees

A handful of terms recur throughout this guide. Each maps to something familiar.

A node is any single item in the tree (a heading, a paragraph, a list item). Think of it as a single compartment on your vessel. The root is the topmost node, the starting point from which everything branches. It is the vessel itself. A parent is a node that contains other nodes, like a chapter that holds sections. A child is a node contained within a parent, such as a paragraph inside a section. A leaf is a node at the very end of a branch with no children, the actual text content, like a specific equipment item. Traversal is the process of visiting every node in order, like conducting a systematic inspection of every space on the ship, deck by deck, compartment by compartment.

These terms are not jargon for its own sake. They are precise descriptions of relationships that enable powerful operations. Consider: find every leaf node of type "requirement" whose parent is tagged "SOLAS Chapter III." That query, run against a document stored as an AST, returns every life-saving appliance requirement in seconds.

What is an Abstract Syntax Tree?

3 What is an Abstract Syntax Tree?

3.1 The concept in plain language

3.2 A maritime analogy: the ship's General Arrangement plan

3.3 How flat text becomes a tree

3.4 The vocabulary of trees