KSON proudly claims "no whitespace sensitivity", which means "misleading indentation" is back. And it's pretty light on syntax, so there are going to be plenty of mistakes made here.
I actually prefer a syntax that is whitespace-sensitive, but should not give meaning to whitespace. That is, the whitespace should not perform a semantic duty, but should be required to be correct.
This is roughly equivalent to saying that a linter can transform the AST of the language into a canonical representation, and the syntax will be rejected unless it matches the canonical representation (modulo things like comments or whitespace-for-clarity).
This sounds like a stricter version of KSON's current warnings for misleading indentation... maybe KSON should have an opt-in feature for this. Thanks for the idea!
Hmmm that's interesting. KSON actually shows a warning when there's misleading indentation, exactly to prevent this sort of thing! It seems like the detection logic only considers indents after a new line, so it doesn't complain in this case. I just opened an issue to see if things can be tightened up a bit (https://github.com/kson-org/kson/issues/221).
To see misleading indentation warnings in action, you can try the following snippet in the playground (https://kson.org/playground/) and will properly get a warning:
ports:
- 80
- 8000
- 10000
- 12000
- 14000
Next to that, note that KSON already has an autoformatter, which also helps prevent misleading indentation in files.
It defaults to simply downloading a precompiled library from GitHub, without any hash verification.
You can instead pass an environment variable to build libkson from source. However, this will run the ./gradlew script in the repo root, which… downloads an giant OpenJDK binary from GitHub and executes it. Later in the build process it does the same for pixi and GraalVM.
The build scripts also only support a small list of platforms (Windows/Linux/macOS on x86_64/arm64), and don't seem to handle cross-compilation.
The compiled library is 2MB for me, which is actually a lot less than I was expecting, so props for that. But that's fairly heavy by Rust standards.
Glad you liked the format. I hope we can close the implementation gaps as development advances, and I'd love to see native libraries sprout for all conceivable programming languages!
I think the current grammar should be precise enough for that, though it's embedded in the source code as a comment and not in its own file (see https://github.com/kson-org/kson/blob/857d585ef26d9f73e080b5...). It probably can't be fed verbatim into a parser generator, but anyone who reads the parser's source code should have an easy time writing a parser by hand for their programming language of choice (heck, they might even have an LLM translate the original parser into whatever language they want, once there is a comprehensive conformance test suite to validate the resulting code).
All in all, I'm confident that KSON can become ubiquitous despite the limitations of the current implementation (provided it catches on, of course).
Meanwhile, I peek at my emacs config and continue to wonder why people don't just embrace a programming language.
Yes, there are bad consequences that can happen. No, you don't dodge having problems by picking a different data format. You just pick different problems. And take away tools from the users to deal with them.
Definitely more control than guessing the right JSON, or breaking that YAML file. Plus, you get completion, introspection, and help while editing the config because you're in a code-writing environment. Bonus for having search and text manipulation tools under your fingertips instead of clicking checkboxes or tabbing through forms.
Configuration files need to be powerful programming languages (in terms of expressiveness) while being restricted (in terms of network and I/O and non-determinism). We need to aim very high for configuration languages especially when we treat them like user interfaces. Look at Cue (https://cuelang.org/), Starlark or Dhall (https://dhall-lang.org/) for inspiration, not JSON, unless your configuration file is almost always written programmatically.
Any configuration language that doesn't support strict/user/explicit types is worthless (ahem jsonnet).
The idea of configuring something but not actually having any sort of assurances that what you're configuring is correct is maddening. Building software with nothing but hopes and dreams.
I don't know that I would particularly _want_ to modify starlark programmatically, but it's certainly not impossible. Build files in bazel for instance are nicely modifiable: https://bazel.build/concepts/build-files.
JSON5 is good enough that it works for frontend devs, backend, qa, firmware, data science, chemists, optical engineers, and the hardware team, in my org at least. Interns pick up on it quickly.
The comment option gives enough space for devs to explain new options flags and objects included to those familiar enough to be using it.
From the application point of view, recently I'm converging on this: define data structures for your config. Ensure it can be deserialized from json and toml. (In Rust this is easy to do with Serde; in Python with Pydantic or dataclasses.) Users can start simple and write toml by hand. If you prefer KSON, sure, write KSON and render to json. If config is UI, I think the structure of the data, and names of fields and values, matter much more than the syntax. (E.g. `timeout = 300` is meaningless regardless of syntax; `timeout_ms = 300` or `timeout = "300 ms"` are self-documenting.)
When the configuration grows complex, and you feel the need to abstract and generate things, switch to a configuration language like Cue or RCL, and render to json. The application doesn't need to force a format onto the user!
This is what I have been debating using for a project at work. You can also nest messages within other messages easily with protobuf so you can aggregate/deaggregate configs as you want. Combined with protobuf validation plugins for your Lang and you get a rather neat package.
the duration one in particular bugs me. I work on a dynamic configuration system and i was super happy when we added proper duration support. we took the approach of storing in iso duration format as a string. so myconfig = `5s` then you get a duration object and can call myconfig.in_millis. so much better imo.
I’ve toyed with Cue, but never in a production setting. I like the ideas behind it, it’s very elegant that the same mechanism enables constraining values and reducing boilerplate. It’s somewhat limited compared to Jsonnet, RCL, Dhall, etc., you don’t get user-defined functions, but the flip side of that is that when you see something being defined, you can be confident that it ends up in the output like that, that it’s not just an input to a series of intractable transformations. I haven’t used it in large enough settings to get a feeling for how much that matters. Also, I find the syntax a bit ugly.
We did a prototype at work to try different configuration languages for our main IaC repository, and Cue was the one I got furthest with, but we ended up just using Python to configure things. Python is not that bad for this: the syntax is light, you get types, IDE/language server support, a full language. One downside is that it’s difficult to inspect a single piece of configuration, you run the entry point and it generates everything.
As for RCL, I use it almost daily as a jq replacement with easier to remember syntax. I also use it in some repositories to generate GitHub Actions workflows, and to keep the version numbers in sync across Cargo.toml files in a repository. I’m very pleased with it, but of course I am biased :-)
Not the OP but I’m a big fan of this pattern as well.
At work we generate both k8s manifests as well as application config in YAML from a Cue source. Cue allows both deduplication, being as DRY as one can hope to be, as well as validation (like validating a value is a URL, or greater than 1, whatever).
The best part is that we have unit tests that deserialize the application config, so entire classes of problems just disappear. The generated files are committed in VCS, and spell out the entire state verbatim - no hopeless Helm junk full of mystery interpolation whose values are unknown until it’s too late. No. The entire thing becomes part of the PR workflow. A hook in CI validates that the generated files correspond to the Cue source (run make target, check if git repo has changes afterwards).
The source of truth are native structs in Go. These native Go types can be imported into Cue and used there. That means config is always up to date with the source of truth. It also means refactoring becomes relatively easy. You rename the thing on the Go side and adjust the Cue side. Very hard to mess up and most of it is automated via tooling.
The application takes almost its entire config from the file, and not from CLI arguments or env vars (shudder…). That means most things are covered by this scheme.
One downside is that the Cue tooling is rough around the edges and error messages can be useless. Other than that, I fully intend to never build applications differently anymore.
I've often reached out to TOML for truly minimal configuration files, so I'd say TOML makes sense in a lot of scenarios. However, my point in the article is that simple stuff eventually stops being simple, so you need to be careful.
As an example, a friend of mine introduced TOML to a reasonably big open source project a while ago. Recently, he mentioned there were some unexpected downsides to TOML. I've asked him to chime in here, because I think he's more qualified to reply (instead of a best-effort attempt from my side to faithfully reproduce what he told me).
I don't understand that either. Considering that it is both well defined and hierarchical, with none of the problems of YAML or JSON (hey, NO is a country, not a truth value), TOML seems like the perfect compromise to me.
Having now worked with terraform for 8 years, I could not agree more. Now, also because of having worked with terraform for 8 years and seeing how that's played out, I've heard and become tired of the whole "superset of json, transcribable to YAML, whitespace is not significant (which has never been a gripe of mine ever, not sure why every product cares so much about that)" promise of a silver bullet, and you very much face the same exact problems, just in different form. Terraform (HCL, to be specific) in particular can become fantastically ugly and verbose and "difficult to modify."
Configuration is difficult, the tooling is rarely the problem (at least in my experience).
Trying to jam your configuration into a generic object notation is always going to be awkward and ugly, the only real saving grace might be that it is a standard syntax.
I use a lot of openbsd and one of the things I really like about it is that they care about the user interface(note 1) and take the time to build custom parsers(note 2)
Compare pf to iptables. I think, strictly speaking, ip tables is the more feature-full filter language. but the devs said "we will just use getopt to parse it" and called it a day. leading to what I find one of the ugliest, more painful languages to use. pf is an absolute joy in comparison. I would pick pf for that reason alone.
note 1. Not in the sense of chasing graphical configurators. An activity I find wastes vast amounts of developer time leading to neutered interfaces. But in the sense of taking the time to lay out a syntax that is pleasant to read and edit and then write good documentation for it.
note 2. If I am honest, semi custom parsers, the standard operating procedure appears to start with a parse.y from a project that is close and mold it into the form needed. which lends itself to a nice sort of consistency in the base system.
This is why Caddy has config adapters: bring any config file language you like, and Caddy will run it. It's built-into the binary and just takes a command line flag to switch languages: https://caddyserver.com/docs/config-adapters
This makes it difficult to configure Caddy in anything except the native Caddyfile language due to a lack of thorough documentation. It's an interesting idea, but configuring Caddy with a yaml config that someone prior deemed a great idea was quite painful.
Curiously, LLMs have made it a lot easier. One step away from an English adapter that routes through an LLM to generate the config.
those formats aren't bijective with each other, right? so there's no way for you to say that foo.cue can be equivalently transformed to any foo.json or any foo.nginx or whatever representation, because those transformations are necessarily lossy, no?
I keep trying to explain the difference between a configuration format and a data format. A configuration format is for humans. A data format is for computers. Try to make one format good for both, you end up with one format that sucks at both.
Another fun thing about configuration: it's a great indicator of poor software design. If the configuration can be very complicated, in one single format, in one big file, look at the source code and you'll find a big ball of mud. Or even worse, it's lots of very different software that all shares one big config file. Both are bad for humans and bad for computers.
I like where their head is at here, especially the "superset of JSON" part. Some of the things I'm not _in love_ with like the %% ending blocks or how maybe a bit of significant whitespace might make things a bit less misleading with indentation as others have said, but overall I think I like this better than YAML.
- The key-value pair. Maybe some section marker (INI,..). Easy to sed.
- The command kind. Where the file contains the same command that can be used in other place (vim, mg, sway). More suited to TUI and bigger applications.
With all the code syntax highlighting support as a feature, I feel it will become tempting to put code in configuration files (which some of their examples show). That just feels wrong. Code should go in code files/modules/libraries, not mixed with configuration files. If your configuration starts to become code, maybe you need to rethink your software architecture. Or perhaps KSON proves that principle to be too rigid and inferior, and leads to more intelligible, manageable software. I guess we'll have to see.
KSON looks interesting. Where I work we did a metadata type project in Pkl recently, which is somewhat similar. Unfortunately, developments on the tooling front for Pkl have taken an extremely very long time. Not sure the the tooling/LSPs are anywhere close to what the language offers yet.
I like the language embedding feature in KSON - we would use that. Have you thought about having functions and variables? That is something you get in Pkl and Dhall which are useful.
This sounds like the kind of question for Daniel himself to chime in, since he has the best overview of the language's design and vision. He's not super active on HN, but I'll give him a heads up! Otherwise feel free to join our Zulip (https://kson-org.zulipchat.com) and we can chat over there.
Was lamenting the bare IDs' limitation, quoting is bad here since there is no ambiguity to resolve, no type to signal.
All these new improved formats should strive to limit the bad parts of existing languages, not eliminate the goods ones
I came across https://kdl.dev recently, and it has become my favored yaml-replacement when normal code doesn't work.
The data model is closer to XML than JSON, though, so unfortunately it's not a drop-in replacement for yaml.
Small sample:
package {
name my-pkg
version "1.2.3"
dependencies {
// Nodes can have standalone values as well as // key/value pairs.
lodash "^3.2.1" optional=#true alias=underscore
}
}
Imho the best configuration file is code written in the language itself.
Configuring TypeScript applications with the `defineConfig` pattern that takes asynchronous callbacks allowing you to code over settings is very useful. And it's fully typed and programmable.
It's particularly useful because it also allows you to trivially use one single configuration file where you only set what's different between environments with some env === "local" then use this db dependency or turn on/off sentry, etc.
Zig is another language that shows that configuration and building should just be code in the language itself.
> Imho the best configuration file is code written in the language itself.
This depends on the trust-model for who is doing the configuration, especially if you're trying to host for multiple tenants/customers. Truly sandboxing arbitrary code is much harder than reading XML/JSON/etc.
Specifically what you want is to provably halt and to be free of side effects. There's some configuration languages with these properties for providing expressions in JSON configs (I'm blanking on their names for the moment) which generally don't advertise themselves as Lisp dialects but boil down to a Lisp AST written in JSON. Another example is Starlark, which provides Python-like syntax.
Note that this implies a Turing incomplete language. Which makes sense - our goal is to make dangerous programs unrepresentable, so naturally we can't implement every algorithm in our restricted language.
I think the post is hurt by the desire to sort of… “have a theory” or take a new stance. The configuration file is obviously not a user interface, it is data. It is data that is typically edited with a text editor. The text editor is the UI. The post doesn’t really justify the idea of calling the configuration file, rather than the program used to edit it, the UI. Instead it focuses on a better standard for the data.
The advancement of standards that make the data easier to handle inside the text editor is great, though! Maybe the slightly confusing (I dare say confused) blog title will help spread the idea of kson, regardless.
Edit: another idea, one that is so obvious that nobody would write a blog post about it, is that configuring your program is part of the UX.
A friend of mine introduced TOML to a reasonably big open source project and he mentioned there were some unexpected downsides to it. I've asked him to chime in here, because I think he's more qualified to reply (note that I pointed him to a sibling comment that's also asking about TOML, here https://news.ycombinator.com/item?id=45295715).
As for Apple Pkl, I think we share the goal of robustness and "toolability", but pkl seems way more complex. I personally think it's more useful to keep things simple, to avoid advancing in the configuration complexity clock (for context, see https://mikehadlow.blogspot.com/2012/05/configuration-comple...).
Question: if whitespace isn't significant, how does it determine ambiguity? Like, if I have a Person that has a Dog, and both have a Name attribute. If I then add a Name after my dog definition, how does it know if it's the name of the dog or the person?
That took some design work! In the end, KSON settled on something called the "end dot", which denotes the end of an object (more details in the docs, at https://kson.org/docs/syntax/#the-end-dot).
> I think it is telling no programming language has settled on YAML or JSON as a syntax.
It's not telling, it's impossible. Neither JSON or YAML are Turing complete. That said, the JS in JSON is for Javascript, whose syntax it is derived from, so at least one major language uses it in its syntax.
GitHub Actions also have function calls, it’s just that they can only occur in very specific places in the program, and to define a function you have to create a Git repository.
Has anyone used both kson and hjson? We use hjson as a more forgiving json-like config format and really like it. kson seems more feature rich, but I'm a bit concerned about things like embedding code in configs. So any thoughts vs hjson?
yes but to validate it you need dynamic runtime logic and therefore a live server with all the I/O glue code that entails. i.e., static types alone cannot render your tax forms
Or you just use YAML. It's a configuration language for your software, you control which parser you use which can be YAML 1.2, you can use the safe loader which can't run untrusted code, and you're parsing the values into your language's types so any type confusion will be instantly caught.
I agree that it's not perfect but worse is better and familiar is a massive win over making your users look up a new file format or set their editor up for it. If you truly hate YAML that's fine, there's plenty of other familiar formats: INI, toml, JSON.
YamlScript (YS) adds more programmatic expressiveness to YAML.
It extends an executable Clojure runtime (non-jvm, like Babaskha).
Created by YAML inventor/maintainer
I'm just so tired of writing bash scripts inside the YAML. I want to be able to lint and test whatever goes into my pipelines and actions. Not fight shitty DX on top of whatever Azure spits out when it fails.
This is entirely understandable, and entirely the fault of whoever thought bash scripts belong in configuration files. If you’re trying to stuff a tiger into a desk drawer, the natural consequences are hardly the fault of the desk drawer.
But it's the situation we're in now. It's what you see in the docs and what my colleagues write as well. I entirely agree that scripting into a markup language doesn't make sense yet the inertia is there and I wish there was some way out.
if you're writing bash inside of yaml then something has gone wrong well before yaml entered the picture, this is a problem with e.g. azure not with the yaml format
Can't tell if asked honestly. Because that's how most platforms handle their pipelines. Terraform or Bicep let you use a declarative language for your platform. Everything else is calling cli commands or scripts from pipelines, written in YAML.
Invariably, ppl will write inline scripts instead of actual scripts in their own files. There are also some SDKs for most of these operations that would let you do it in code but they are not always 100% the same as the CLI, some options are different, etc.
Yes, this is how Gitlab pipelines work. It's actually easier to just inline the script most of the time than have a bucket of misc scripts lying around. Especially since you have hooks like before/after_script which would be really awkward to externalize.
My personal opinion is:
If your config needs to be so complex you can't make do with TOML you should just use a interpreted programming language instead. It is totally acceptable to use a config.py for example. You get lists, dicts, classes, and all more or less known and well behaved and people can automate away the boring stuff.
But I'd strongly encourage everybody to think about whether that deep configurability is really needed.
That’s why 90% of each iOS update is just another menu or a reorganization of menus and why there are 3 different ways to access the same info/menus on iOS.
KSON proudly claims "no whitespace sensitivity", which means "misleading indentation" is back. And it's pretty light on syntax, so there are going to be plenty of mistakes made here.
Here is an example I made in a few minutes:
Guess how it parses? answer:I actually prefer a syntax that is whitespace-sensitive, but should not give meaning to whitespace. That is, the whitespace should not perform a semantic duty, but should be required to be correct.
This is roughly equivalent to saying that a linter can transform the AST of the language into a canonical representation, and the syntax will be rejected unless it matches the canonical representation (modulo things like comments or whitespace-for-clarity).
This sounds like a stricter version of KSON's current warnings for misleading indentation... maybe KSON should have an opt-in feature for this. Thanks for the idea!
I'm an idiot. Nevermind me
The configuration language that is the subject of the article
So, opposite of python?
Hmmm that's interesting. KSON actually shows a warning when there's misleading indentation, exactly to prevent this sort of thing! It seems like the detection logic only considers indents after a new line, so it doesn't complain in this case. I just opened an issue to see if things can be tightened up a bit (https://github.com/kson-org/kson/issues/221).
To see misleading indentation warnings in action, you can try the following snippet in the playground (https://kson.org/playground/) and will properly get a warning:
Next to that, note that KSON already has an autoformatter, which also helps prevent misleading indentation in files.Yeah, I'm going to say NO WAY to that.
Assuming this takes off, there would be a prettier-plugin that corrects any weird formatting.
When I think about it, any language should come with a strict, non-configurable built-in formatter anyways.
Or you can have languages which do not depend on plugin or formatters to be correct.
If you supposedly human writable config file format is unusable without external tools, there is something wrong with it.
> any language should come with a strict, non-configurable built-in formatter
Would that be on the language, or the IDEs that support it? Seems out of scope to the language itself, but maybe I'm misunderstanding.
The languave server, which in this age of LSP should come from the same project as the language itself.
Yes, the syntax makes no intuitive sense, whatsoever.
Sounds interesting as a format, but the implementation is a big supply-chain attack risk if you're not already in the JVM ecosystem.
This is because the only implementation is written in Kotlin. There are Python and Rust packages, but they both just link against the Kotlin version.
How do you build the Kotlin version? Well, let's look at the Rust package's build.rs:
https://github.com/kson-org/kson/blob/main/lib-rust/kson-sys...
It defaults to simply downloading a precompiled library from GitHub, without any hash verification.
You can instead pass an environment variable to build libkson from source. However, this will run the ./gradlew script in the repo root, which… downloads an giant OpenJDK binary from GitHub and executes it. Later in the build process it does the same for pixi and GraalVM.
The build scripts also only support a small list of platforms (Windows/Linux/macOS on x86_64/arm64), and don't seem to handle cross-compilation.
The compiled library is 2MB for me, which is actually a lot less than I was expecting, so props for that. But that's fairly heavy by Rust standards.
Glad you liked the format. I hope we can close the implementation gaps as development advances, and I'd love to see native libraries sprout for all conceivable programming languages!
Edit: point taken about verifying checksums, just created an issue for it (https://github.com/kson-org/kson/issues/222)
Past-me had hoped that by the Future Year 2025, it'd be typical to publish a parser grammar file for this kind of thing.
Both to bootstrap making a parser in a new language, and also as a kind of living spec document.
I think the current grammar should be precise enough for that, though it's embedded in the source code as a comment and not in its own file (see https://github.com/kson-org/kson/blob/857d585ef26d9f73e080b5...). It probably can't be fed verbatim into a parser generator, but anyone who reads the parser's source code should have an easy time writing a parser by hand for their programming language of choice (heck, they might even have an LLM translate the original parser into whatever language they want, once there is a comprehensive conformance test suite to validate the resulting code).
All in all, I'm confident that KSON can become ubiquitous despite the limitations of the current implementation (provided it catches on, of course).
Meanwhile, I peek at my emacs config and continue to wonder why people don't just embrace a programming language.
Yes, there are bad consequences that can happen. No, you don't dodge having problems by picking a different data format. You just pick different problems. And take away tools from the users to deal with them.
Just something like an INI file might be fine for most use-cases as well and is easier to reason about.
Elisp for Emacs, Lua for those on Neovim...
Definitely more control than guessing the right JSON, or breaking that YAML file. Plus, you get completion, introspection, and help while editing the config because you're in a code-writing environment. Bonus for having search and text manipulation tools under your fingertips instead of clicking checkboxes or tabbing through forms.
Configuration files need to be powerful programming languages (in terms of expressiveness) while being restricted (in terms of network and I/O and non-determinism). We need to aim very high for configuration languages especially when we treat them like user interfaces. Look at Cue (https://cuelang.org/), Starlark or Dhall (https://dhall-lang.org/) for inspiration, not JSON, unless your configuration file is almost always written programmatically.
Any configuration language that doesn't support strict/user/explicit types is worthless (ahem jsonnet).
The idea of configuring something but not actually having any sort of assurances that what you're configuring is correct is maddening. Building software with nothing but hopes and dreams.
Or Jsonnet (https://jsonnet.org), if you do like JSON but want less quoting.
expressiveness unfortunately usually means that while you can read the output value, you lose the ability to modify it programmatically...
I don't know that I would particularly _want_ to modify starlark programmatically, but it's certainly not impossible. Build files in bazel for instance are nicely modifiable: https://bazel.build/concepts/build-files.
JSON5 is good enough that it works for frontend devs, backend, qa, firmware, data science, chemists, optical engineers, and the hardware team, in my org at least. Interns pick up on it quickly.
The comment option gives enough space for devs to explain new options flags and objects included to those familiar enough to be using it.
For customer facing configurations we build a UI.
In the kitchen sink example (https://json5.org/), they say:
> "backwardsCompatible": "with JSON",
But in that same example, they have a comment like this:
> // comments
Wouldn't that make it not compatible with JSON?
It's confusing.
From what I understand, it's "backwards-compatible" with JSON because valid JSON is also valid JSON5.
But it's not "forwards-compatible" precisely because of comments etc.
Stating `backwards compatible` at a feature is at best confusing for readers who haven't already bought into JSON5 as "the way".
If people tended to interpret English correctly and not be susceptible to misinterpreting written statements, that would be nice. Reality is a bugger!
Backwards-compatible means the new thing can handle the old things. Here JSON5 is backwards-compatible with JSON.
Forwards-compatible means the old thing can handle the new things. Here JSON is not forwards-compatible with JSON5.
Your existing JSON < 5 will work with json5, not the other way around
It’s a superset of JSON. I guess they mean it’s backwards compatible in terms of reading existing JSONs?
From the application point of view, recently I'm converging on this: define data structures for your config. Ensure it can be deserialized from json and toml. (In Rust this is easy to do with Serde; in Python with Pydantic or dataclasses.) Users can start simple and write toml by hand. If you prefer KSON, sure, write KSON and render to json. If config is UI, I think the structure of the data, and names of fields and values, matter much more than the syntax. (E.g. `timeout = 300` is meaningless regardless of syntax; `timeout_ms = 300` or `timeout = "300 ms"` are self-documenting.)
When the configuration grows complex, and you feel the need to abstract and generate things, switch to a configuration language like Cue or RCL, and render to json. The application doesn't need to force a format onto the user!
We use protobuf as schemas for json config. Protobuf has builtin json support and works across languages. It’s great for multi-language projects.
This is what I have been debating using for a project at work. You can also nest messages within other messages easily with protobuf so you can aggregate/deaggregate configs as you want. Combined with protobuf validation plugins for your Lang and you get a rather neat package.
the duration one in particular bugs me. I work on a dynamic configuration system and i was super happy when we added proper duration support. we took the approach of storing in iso duration format as a string. so myconfig = `5s` then you get a duration object and can call myconfig.in_millis. so much better imo.
I like this take! Have you used Cue or RCL yourself? How was the experience?
I’ve toyed with Cue, but never in a production setting. I like the ideas behind it, it’s very elegant that the same mechanism enables constraining values and reducing boilerplate. It’s somewhat limited compared to Jsonnet, RCL, Dhall, etc., you don’t get user-defined functions, but the flip side of that is that when you see something being defined, you can be confident that it ends up in the output like that, that it’s not just an input to a series of intractable transformations. I haven’t used it in large enough settings to get a feeling for how much that matters. Also, I find the syntax a bit ugly.
We did a prototype at work to try different configuration languages for our main IaC repository, and Cue was the one I got furthest with, but we ended up just using Python to configure things. Python is not that bad for this: the syntax is light, you get types, IDE/language server support, a full language. One downside is that it’s difficult to inspect a single piece of configuration, you run the entry point and it generates everything.
As for RCL, I use it almost daily as a jq replacement with easier to remember syntax. I also use it in some repositories to generate GitHub Actions workflows, and to keep the version numbers in sync across Cargo.toml files in a repository. I’m very pleased with it, but of course I am biased :-)
Didn't know about RCL! The project readme looks promising. I'll have to take it out for a spin. Thanks for creating it :)
Not the OP but I’m a big fan of this pattern as well.
At work we generate both k8s manifests as well as application config in YAML from a Cue source. Cue allows both deduplication, being as DRY as one can hope to be, as well as validation (like validating a value is a URL, or greater than 1, whatever).
The best part is that we have unit tests that deserialize the application config, so entire classes of problems just disappear. The generated files are committed in VCS, and spell out the entire state verbatim - no hopeless Helm junk full of mystery interpolation whose values are unknown until it’s too late. No. The entire thing becomes part of the PR workflow. A hook in CI validates that the generated files correspond to the Cue source (run make target, check if git repo has changes afterwards).
The source of truth are native structs in Go. These native Go types can be imported into Cue and used there. That means config is always up to date with the source of truth. It also means refactoring becomes relatively easy. You rename the thing on the Go side and adjust the Cue side. Very hard to mess up and most of it is automated via tooling.
The application takes almost its entire config from the file, and not from CLI arguments or env vars (shudder…). That means most things are covered by this scheme.
One downside is that the Cue tooling is rough around the edges and error messages can be useless. Other than that, I fully intend to never build applications differently anymore.
Why is TOML allegedly "too minimal" for a "small configuration file"?
I've often reached out to TOML for truly minimal configuration files, so I'd say TOML makes sense in a lot of scenarios. However, my point in the article is that simple stuff eventually stops being simple, so you need to be careful.
As an example, a friend of mine introduced TOML to a reasonably big open source project a while ago. Recently, he mentioned there were some unexpected downsides to TOML. I've asked him to chime in here, because I think he's more qualified to reply (instead of a best-effort attempt from my side to faithfully reproduce what he told me).
I don't understand that either. Considering that it is both well defined and hierarchical, with none of the problems of YAML or JSON (hey, NO is a country, not a truth value), TOML seems like the perfect compromise to me.
Having now worked with terraform for 8 years, I could not agree more. Now, also because of having worked with terraform for 8 years and seeing how that's played out, I've heard and become tired of the whole "superset of json, transcribable to YAML, whitespace is not significant (which has never been a gripe of mine ever, not sure why every product cares so much about that)" promise of a silver bullet, and you very much face the same exact problems, just in different form. Terraform (HCL, to be specific) in particular can become fantastically ugly and verbose and "difficult to modify."
Configuration is difficult, the tooling is rarely the problem (at least in my experience).
Trying to jam your configuration into a generic object notation is always going to be awkward and ugly, the only real saving grace might be that it is a standard syntax.
I use a lot of openbsd and one of the things I really like about it is that they care about the user interface(note 1) and take the time to build custom parsers(note 2)
Compare pf to iptables. I think, strictly speaking, ip tables is the more feature-full filter language. but the devs said "we will just use getopt to parse it" and called it a day. leading to what I find one of the ugliest, more painful languages to use. pf is an absolute joy in comparison. I would pick pf for that reason alone.
note 1. Not in the sense of chasing graphical configurators. An activity I find wastes vast amounts of developer time leading to neutered interfaces. But in the sense of taking the time to lay out a syntax that is pleasant to read and edit and then write good documentation for it.
note 2. If I am honest, semi custom parsers, the standard operating procedure appears to start with a parse.y from a project that is close and mold it into the form needed. which lends itself to a nice sort of consistency in the base system.
This is why Caddy has config adapters: bring any config file language you like, and Caddy will run it. It's built-into the binary and just takes a command line flag to switch languages: https://caddyserver.com/docs/config-adapters
This makes it difficult to configure Caddy in anything except the native Caddyfile language due to a lack of thorough documentation. It's an interesting idea, but configuring Caddy with a yaml config that someone prior deemed a great idea was quite painful.
Curiously, LLMs have made it a lot easier. One step away from an English adapter that routes through an LLM to generate the config.
those formats aren't bijective with each other, right? so there's no way for you to say that foo.cue can be equivalently transformed to any foo.json or any foo.nginx or whatever representation, because those transformations are necessarily lossy, no?
I keep trying to explain the difference between a configuration format and a data format. A configuration format is for humans. A data format is for computers. Try to make one format good for both, you end up with one format that sucks at both.
Another fun thing about configuration: it's a great indicator of poor software design. If the configuration can be very complicated, in one single format, in one big file, look at the source code and you'll find a big ball of mud. Or even worse, it's lots of very different software that all shares one big config file. Both are bad for humans and bad for computers.
I like where their head is at here, especially the "superset of JSON" part. Some of the things I'm not _in love_ with like the %% ending blocks or how maybe a bit of significant whitespace might make things a bit less misleading with indentation as others have said, but overall I think I like this better than YAML.
There’s two configuration that I like:
- The key-value pair. Maybe some section marker (INI,..). Easy to sed.
- The command kind. Where the file contains the same command that can be used in other place (vim, mg, sway). More suited to TUI and bigger applications.
With these two, include statement are nice.
With all the code syntax highlighting support as a feature, I feel it will become tempting to put code in configuration files (which some of their examples show). That just feels wrong. Code should go in code files/modules/libraries, not mixed with configuration files. If your configuration starts to become code, maybe you need to rethink your software architecture. Or perhaps KSON proves that principle to be too rigid and inferior, and leads to more intelligible, manageable software. I guess we'll have to see.
KSON looks interesting. Where I work we did a metadata type project in Pkl recently, which is somewhat similar. Unfortunately, developments on the tooling front for Pkl have taken an extremely very long time. Not sure the the tooling/LSPs are anywhere close to what the language offers yet.
I like the language embedding feature in KSON - we would use that. Have you thought about having functions and variables? That is something you get in Pkl and Dhall which are useful.
Thanks for the kind words :)
This sounds like the kind of question for Daniel himself to chime in, since he has the best overview of the language's design and vision. He's not super active on HN, but I'll give him a heads up! Otherwise feel free to join our Zulip (https://kson-org.zulipchat.com) and we can chat over there.
Naming is too strict, so valid friendly yaml names don't work
price_in_€: 42
Just tested this on the playground and observed that the following does work:
Or were you already aware of that and lamenting that KSON requires quoting in this case, compared to YAML which doesn't?Was lamenting the bare IDs' limitation, quoting is bad here since there is no ambiguity to resolve, no type to signal. All these new improved formats should strive to limit the bad parts of existing languages, not eliminate the goods ones
I came across https://kdl.dev recently, and it has become my favored yaml-replacement when normal code doesn't work.
The data model is closer to XML than JSON, though, so unfortunately it's not a drop-in replacement for yaml.
Small sample:
I wish they had some sort of include/composable file mechanism though
Imho the best configuration file is code written in the language itself.
Configuring TypeScript applications with the `defineConfig` pattern that takes asynchronous callbacks allowing you to code over settings is very useful. And it's fully typed and programmable.
It's particularly useful because it also allows you to trivially use one single configuration file where you only set what's different between environments with some env === "local" then use this db dependency or turn on/off sentry, etc.
Zig is another language that shows that configuration and building should just be code in the language itself.
> Imho the best configuration file is code written in the language itself.
This depends on the trust-model for who is doing the configuration, especially if you're trying to host for multiple tenants/customers. Truly sandboxing arbitrary code is much harder than reading XML/JSON/etc.
Programmatically manipulating it and validating it gets harder though.
Unless it's a lisp.
Specifically what you want is to provably halt and to be free of side effects. There's some configuration languages with these properties for providing expressions in JSON configs (I'm blanking on their names for the moment) which generally don't advertise themselves as Lisp dialects but boil down to a Lisp AST written in JSON. Another example is Starlark, which provides Python-like syntax.
Note that this implies a Turing incomplete language. Which makes sense - our goal is to make dangerous programs unrepresentable, so naturally we can't implement every algorithm in our restricted language.
KSON looks neat.
I think the post is hurt by the desire to sort of… “have a theory” or take a new stance. The configuration file is obviously not a user interface, it is data. It is data that is typically edited with a text editor. The text editor is the UI. The post doesn’t really justify the idea of calling the configuration file, rather than the program used to edit it, the UI. Instead it focuses on a better standard for the data.
The advancement of standards that make the data easier to handle inside the text editor is great, though! Maybe the slightly confusing (I dare say confused) blog title will help spread the idea of kson, regardless.
Edit: another idea, one that is so obvious that nobody would write a blog post about it, is that configuring your program is part of the UX.
How can TOML be too minimal? Also, what about Apple Pickle?
A friend of mine introduced TOML to a reasonably big open source project and he mentioned there were some unexpected downsides to it. I've asked him to chime in here, because I think he's more qualified to reply (note that I pointed him to a sibling comment that's also asking about TOML, here https://news.ycombinator.com/item?id=45295715).
As for Apple Pkl, I think we share the goal of robustness and "toolability", but pkl seems way more complex. I personally think it's more useful to keep things simple, to avoid advancing in the configuration complexity clock (for context, see https://mikehadlow.blogspot.com/2012/05/configuration-comple...).
Question: if whitespace isn't significant, how does it determine ambiguity? Like, if I have a Person that has a Dog, and both have a Name attribute. If I then add a Name after my dog definition, how does it know if it's the name of the dog or the person?
That took some design work! In the end, KSON settled on something called the "end dot", which denotes the end of an object (more details in the docs, at https://kson.org/docs/syntax/#the-end-dot).
I think it is telling no programming language has settled on YAML or JSON as a syntax. Because that would drive you nuts.
But we allow it for files that tend to make production changes usually without any unit tests!
I'd prefer something syntaxed like a programming language but without turing completeness.
> I think it is telling no programming language has settled on YAML or JSON as a syntax.
It's not telling, it's impossible. Neither JSON or YAML are Turing complete. That said, the JS in JSON is for Javascript, whose syntax it is derived from, so at least one major language uses it in its syntax.
Azure and GitHub build pipelines are written in yaml and have conditionals, variables, template expansions, etc
GitHub Actions also have function calls, it’s just that they can only occur in very specific places in the program, and to define a function you have to create a Git repository.
And don’t forget Ansible playbooks!
Has anyone used both kson and hjson? We use hjson as a more forgiving json-like config format and really like it. kson seems more feature rich, but I'm a bit concerned about things like embedding code in configs. So any thoughts vs hjson?
I kinda want jq as a config file language. It would need some sort of nice multi-line string support, but yeah.
Came across hocon recently and prefer over both yaml and kson. https://learnxinyminutes.com/hocon/
Is the K in KSON for Kotlin? Does the S stand for anything? I skimmed the docs but didn't find anything addressing the name.
Programmers don't mind "programming", but most people don't want that UI.
The people behind KSON are geniuses. No joke. I worked with Daniel. And he's an amazing human too. Congrats on launching!!!
While I agree with the general idea, 2 minutes of looking at KSON was enough for me. If this is UI, it’s an ugly one.
I’ll stick to JSON. When JSON isn’t enough it usually means the schema needs an update.
yes but to validate it you need dynamic runtime logic and therefore a live server with all the I/O glue code that entails. i.e., static types alone cannot render your tax forms
Or you just use YAML. It's a configuration language for your software, you control which parser you use which can be YAML 1.2, you can use the safe loader which can't run untrusted code, and you're parsing the values into your language's types so any type confusion will be instantly caught.
I agree that it's not perfect but worse is better and familiar is a massive win over making your users look up a new file format or set their editor up for it. If you truly hate YAML that's fine, there's plenty of other familiar formats: INI, toml, JSON.
YamlScript (YS) adds more programmatic expressiveness to YAML. It extends an executable Clojure runtime (non-jvm, like Babaskha). Created by YAML inventor/maintainer
https://yamlscript.org/
I'm just so tired of writing bash scripts inside the YAML. I want to be able to lint and test whatever goes into my pipelines and actions. Not fight shitty DX on top of whatever Azure spits out when it fails.
This is entirely understandable, and entirely the fault of whoever thought bash scripts belong in configuration files. If you’re trying to stuff a tiger into a desk drawer, the natural consequences are hardly the fault of the desk drawer.
But it's the situation we're in now. It's what you see in the docs and what my colleagues write as well. I entirely agree that scripting into a markup language doesn't make sense yet the inertia is there and I wish there was some way out.
if you're writing bash inside of yaml then something has gone wrong well before yaml entered the picture, this is a problem with e.g. azure not with the yaml format
> I'm just so tired of writing bash scripts inside the YAML.
Why would you be doing that?
Can't tell if asked honestly. Because that's how most platforms handle their pipelines. Terraform or Bicep let you use a declarative language for your platform. Everything else is calling cli commands or scripts from pipelines, written in YAML.
I suppose I am confused by what you mean with, "writing bash scripts".
Like, are you _literally_ scripting in the value of an item or are you just equating that they are similar?
Literal being:
get_number_of_lines:
Something like this: https://learn.microsoft.com/en-us/azure/devops/pipelines/tas...
Invariably, ppl will write inline scripts instead of actual scripts in their own files. There are also some SDKs for most of these operations that would let you do it in code but they are not always 100% the same as the CLI, some options are different, etc.
Yes, this is how Gitlab pipelines work. It's actually easier to just inline the script most of the time than have a bucket of misc scripts lying around. Especially since you have hooks like before/after_script which would be really awkward to externalize.
My personal opinion is: If your config needs to be so complex you can't make do with TOML you should just use a interpreted programming language instead. It is totally acceptable to use a config.py for example. You get lists, dicts, classes, and all more or less known and well behaved and people can automate away the boring stuff.
But I'd strongly encourage everybody to think about whether that deep configurability is really needed.
Yes all user interfaces are a key/value list.
That’s why 90% of each iOS update is just another menu or a reorganization of menus and why there are 3 different ways to access the same info/menus on iOS.
two huge thumbs down for KSON
Anecdote: I am still of the opinion that in most (~99%)[1] of the situations people are in "YAML Hell" because they put themselves in "YAML Hell".
1: I pulled that out of my butt, there's no factual data to it.