Token-efficient data formats without escaping

Anthropic’s Claude API uses a modified XML syntax to minimize escaping:

<content>She said "hello" to me</content>

No escaping needed for quotes. But try mapping a record to XML:

<record>
  <name>app</name>
  <version>1.0</version>
  <config>
    <port>8080</port>
  </config>
</record>

XML becomes verbose and token-heavy for nested data. NUON handles this better:

{name: app, version: "1.0", config: {port: 8080}}

Bare strings work without quotes, saving tokens.

However, NUON still escapes strings with quotes or backslashes. Say you’re reading a TOML config file:

# example.toml
name = "my-app"
version = "1.0.0"

When you serialize it to NUON:

open --raw example.toml | to nuon

Output:

"# example.toml
name = \"my-app\"
version = \"1.0.0\"
"

At least NUON doesn’t escape newlines, but quotes still get escaped. This breaks AI tooling—models see tokens, not characters. \" might be one token or two. Find-replace becomes fragile because the model’s token boundaries don’t match what humans see as strings. Escaping makes these operations delicate and error-prone.

SNUON: Simple NUON

SNUON (Simple NUON) is a token-efficient data format that extends NUON by using raw strings to eliminate escaping. When a string contains quotes or backslashes, SNUON uses Nushell’s raw string syntax (r#'...'#). Otherwise it’s identical to NUON.

Raw strings solve escaping

Instead of escaping quotes and backslashes, SNUON uses raw strings. The same example becomes:

open --raw example.toml | to snuon

Output:

r##'# example.toml
name = "my-app"
version = "1.0.0"
'##

The string is exactly what you see between r##' and '##—no escaping needed.

Model bias

When models see escaped strings in context, they learn to output escaped strings. SNUON fixes this—input and output use the same format:

r##'# example.toml
name = "my-app"
version = "1.0.0"
'## | str length

Output: 61

The raw string works directly in Nushell. Models see the same syntax they should generate. No translation between escaped and unescaped forms. The format biases models toward correct output.