From XML to Docs-as-Code: Why We Made the Switch

For most of its life, our product documentation lived inside a proprietary XML editorial system: a closed authoring tool, a closed storage format, and a publishing pipeline that only one or two people fully understood. It worked, in the sense that manuals got published. It did not work as a foundation for anything else.

The problem wasn't the writing

The content itself was fine. The problem was everything around it. Versioning meant exporting snapshots rather than branching. Reviews happened over email threads with attached exports. And because the source format was proprietary, nothing outside the editorial tool could read it — not a CI pipeline, not a search index, and later, not an AI assistant.

Why Docs-as-Code

Moving to a Docs-as-Code workflow — plain-text AsciiDoc source, built and published with Antora — wasn't about chasing a trend. It solved three concrete problems at once:

Version control. Documentation could finally live next to the code it described, branch with releases, and go through the same pull-request review process as everything else in R&D.
Automation. Plain text is scriptable. I wrote a set of Python and Bash tools to handle the migration itself and, afterward, to automate repetitive parts of the publishing process that used to be manual.
Reuse. Once content is just text in a repository, it can be read by more than one consumer — a build pipeline, a linter, or later, a retrieval system for an AI chatbot.

A simplified version of what the migration tooling looked like:

for file in legacy_xml/*.xml; do
  python convert_xml_to_adoc.py "$file" > "antora/modules/ROOT/pages/$(basename "$file" .xml).adoc"
done

The real scripts had to deal with inconsistent legacy markup, broken cross-references, and image paths that had drifted for a decade — the unglamorous part of any migration that takes far longer than the conversion logic itself.

What changed for the team

The editorial workflow shifted from "open the tool, find the topic, edit it" to "clone the repo, edit the file, open a pull request." That's a bigger change for a documentation team than it sounds like — it meant learning Git, getting comfortable with diffs as the unit of review, and trusting a build pipeline instead of a WYSIWYG preview. It paid off in review quality: diffs make it much easier to spot an unintended change than a side-by-side export ever did.

What I'd do differently

Knowing what I know now, I'd invest earlier in linting and style enforcement. Once content is plain text, tools like Vale or a markdownlint-style checker can catch terminology drift and style violations automatically — we added that only after the migration was already underway, and retrofitting style rules onto years of existing content is much harder than enforcing them from day one.