YAML should die (Actually it should not.)
YAML is a pretty powerful and convenient human-readable serialization language.
Unfortunately it has been abused and twisted to do things it was never meant to do.
Software engineers should consider using (or creating) a DSL as soon as possible in their design process.
And finally, I think the world deserve at least one standardized DSL dedicated to CI/CD.
This is another rant that stayed in my head for too long before I finally decided to write it here.
Hell, Martin Tournoij post YAML: probably not so great after all trace back to 2016!
After using YAML for nine years, some points have been addressed, but I’m still angry enough to write this article.
https://yaml.org/ first lines reads:
YAML: YAML Ain't Markup Language What It Is: YAML is a human friendly data serialization standard for all programming languages.
And what it is being used for?
in place of a DCL
in place of a DSL
to replace XML
XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable
I give you that, YAML is undoubtedly more human-friendly than XML, produce smaller files and is arguably more machine-friendly as well (see XML criticism below).
For the naives, XML might seems an easier syntax to parse at first sight (especially if you know about my YAML criticism); but the specification have really fucked-up features.
The first to comes in mind is the different ways to set attributes (all the following are equivalent):
<?xml version="1.0"?> <params> <param foo="bar"/> <param foo="bar"></param> <param> <foo>bar</foo> </param> </params>
An another is the myriad of ways to encode strings which have been abused through history to make the document do things it wasn’t supposed to.
The language is extensible and an XML parser should follow the extensions; XLink is an example (which can be used as YAML anchors in a same document).
SVG being XML, it support linking; or at least it should, but not all SVG renderer implemented it because it’s a pain.
XML is more verbose than YAML (well, XML is more verbose than anything I can think of; except maybe some Microsoft formats) and lot of programs relies on DOM parsing, requiring to load the whole document before processing. Thankfully there is SAX allowing incremental parsing.
The last critic would be that a slight mistake render the whole document invalid, and humans make mistakes.
Thereafter I will use interchangeably the DCL and DSL terms because the line is somewhat thin… Probably because the authors of the software themselves weren’t sure.
This part is the core of my complaint.
As a sysadmin (oops, the fashionable buzzword of this era is DevOps) I have to deal with the insufferable nonsense of using YAML instead of an appropriate DSL, leaving you no way to verify what you wrote without executing it.
- Configuration Management Software:
- Continuous Integration:
Circle CI, Concourse CI, GitLab CI, Travis CI… In fact most CI systems relies on YAML to describe your pipelines.
- IaaS, PaaS, IaC… The Cloud!
Cloud Foundry, Docker, Kubernetes… Okay, I stop here.
K8s had wrecked havoc.
As soon as you get a little serious with it, you need to use templates and YAML generators (HELM, Kapitan, ytt…) resulting in “configuration files” in the thousands of lines (it’s even the Stackery selling pitch).
(Hint: No, it’s 42).
How many times did you had to try to rewrite your playbook/tasks before you can achieve what you meant?
The YAML you wrote was valid, it was even accepted by the program, it began to execute and then b̶̨̼̣̑o̵̜̝̝͌́͝ỏ̷̘̜̱͔̽͘͝ḿ̷̤̯̻͙͒͛̈́…☠
- Now a somewhat rhetorical question:
“If no human is going to write those files, and nobody is going to read them; what is the point of using YAML?”.
> “You are wrong, human are writing those files and reading them!”
Well sort of. Human are writing “parts” of those files, cursing against the workarounds or the lack of them…
Whenever they are using templates (or UI, like Stackery) I consider they are not writing them; they could have written in an appropriate language which, in his turn, like the template, would have been “complied” in some appropriate, machine-readable, format.
Whenever they are using a command to read them (yq 4, jq, Visual Studio Code Kubernetes Tools, K8sYAML…) they are not reading them; they are using a program to make humanly-readable something which was not.
You are using a CLI or a GUI to interact with your databases? The same to read/write images or their Exif? It is the same; it should be the same…
Furthermore, YAML is not even easy to parse (see YAML criticism)!
Some software made the [right!] choice to develop their DCL/DSL or at least to use something more appropriate than YAML.
As it is a generalist language, it is not a simple syntax to parse either.
It has been created to be able to serialize a lot of different languages structures while remaining pleasant to read by a human.
Let’s explore a few of it’s features:
builtin types (booleans, empty and null differentiation, floating point numbers, integers, mappings, scalars, sequences, timestamps, unordered sets…)
custom types, tags, tags shorthand
anchoring/aliasing, creating (at least) two issues:
self-referential / circular data structures are usually not welcome in CMS, CI/CD and configuration
is not indented for “recursive merging”, bringing the GitLab team to create the extends keyword as workaround
multiple ways to write the same thing:
builtin types may be explicit or implicit, the following example (just for scalar) were all lines are equivalent:
- !!str "string" - "string" - 'string' - string - !<tag:yaml.org,2002:str> "string" - "\x73\x74\x72\x69\x6e\x67" - "\u0073\u0074\u0072\u0069\u006e\u0067"
overly smart integer parsing:
- !!int "11" - 11 - 0xb - 0xB - 013 - 1_1
There are 9 (or 63, depending how you count) different ways to write multi-line strings in YAML.
optional header for version specification and directives
optional document separators
automatic but parametrizable indentation level
smart line/flow folding, block chomping
and much more…
I’m sure you do not need all of these features and am pretty confident you did not event heard about half of them.
And yet, you need some other features incompatibles with the language (like loops, includes, variables…)!
It has been demonstrated that in the Configuration Management world some brilliant minds arrived to the conclusion a DSL was a necessity:
Mark Burgess (wikipedia), a computer scientist, made the CFEngline language after a long reflection on the Promise Theory
The HCL README has a “Why?” section explaining their motivations
Marin Atanasov Nikolov, who wrote Gru, expressed why he choose Lua as the data description and configuration language
In Configuration Management, once I was fed up with YAML usage, there were alternatives available I could turn up to.
Alas, in the Continuous world, there is no fallback… And yet, the configuration here is arguably easier than in the other (configuration management, IaaS, …) domains because:
the maximum scope of the CI/CD is (more or less) already known
all solution already share a common subset and most of them tend to go towards providing the same features
Honestly, I long for the main actors to sit up together and write a RFC for an interoperable DSL…
It should be possible to write something portable with a simple subset of the language, possible to identify which software implemented which parts in their documentation or in a centralized manner (a bit like what Can I use does for browsers).
Call it a confirmation bias; but I do not feel alone raging against YAML abuse:
A few quotes from the start of https://news.ycombinator.com/item?id=20731160 (mid-2019):
From my experience, while YAML itself is something one can learn to live with, the true horror starts when people start using text template engines to generate YAML. Like it’s done in Helm charts, for example. Aren’t these “indent” filters beautiful?
We had joy, we had fun, we had seasons in the sun, but as I added more and more features and syntax to cover specific requirements and uncommon edge cases, I realized I was on an inevitable death-march towards my cute little program becoming sufficiently complicated to trigger Greenspun’s tenth rule.
> the true horror starts when people start using text template engines > to generate YAML
I just had a shiver recalling a Kubernetes wrapper wrapper wapper wrapper at a former job. I think there were at least two layers of mystical YAML generation hell. I couldn’t stop it, and it tanked much joy in my work. It was a factor in me moving on.
The most frequent I encountered were DTD and more recently XSD and RELAX NG.
Other notable alternatives are:
About his creation: mgmt; a “next generation distributed, event-driven, parallel config management”.
James was an ingeneer at RedHat, working on Puppet, which led him to create mgmt.
It is highly inefficient on large files, partly because it use a DOM-like approach (to load all the file before doing any processing).