Tsonnet #20 - Adding the sweet stuff

Welcome to the Tsonnet series!

If you're not following the series so far, you can check out how it all started in the first post of the series.

In the previous post, I finally added type checking to Tsonnet:

Tsonnet #19 - Type checking and semantic analysis

Hercules Lemke Merscher ・ Jul 13

#tsonnet #jsonnet #compiler

Now that we have the core of the type checker, I'm ready to support more complex idioms from Jsonnet in Tsonnet.

Let's start with the syntax.jsonnet example from the Jsonnet tutorial:

/* A C-style comment. */
# A Python-style comment.
{
  cocktails: {
    // Ingredient quantities are in fl oz.
    'Tom Collins': {
      ingredients: [
        { kind: "Farmer's Gin", qty: 1.5 },
        { kind: 'Lemon', qty: 1 },
        { kind: 'Simple Syrup', qty: 0.5 },
        { kind: 'Soda', qty: 2 },
        { kind: 'Angostura', qty: 'dash' },
      ],
      garnish: 'Maraschino Cherry',
      served: 'Tall',
      description: |||
        The Tom Collins is essentially gin and
        lemonade.  The bitters add complexity.
      |||,
    },
    Manhattan: {
      ingredients: [
        { kind: 'Rye', qty: 2.5 },
        { kind: 'Sweet Red Vermouth', qty: 1 },
        { kind: 'Angostura', qty: 'dash' },
      ],
      garnish: 'Maraschino Cherry',
      served: 'Straight Up',
      description: @'A clear \ red drink.',
    },
  },
}

There are still a few things missing to compile this file:

Python-style comments
Single-quoted strings -- we only have double-quoted strings so far
Verbatim strings

Python-style comment

I find it weird that Jsonnet supports Python-style comments, and not just C-style comments, but Tsonnet should be as compatible as possible with Jsonnet, so be it.

It's fairly easy to support this -- just one more lexing pattern for inline comments. The following diff shows the simple addition to the lexer pattern:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index 49e43af..51e3b0e 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -16,7 +16,7 @@ let null = "null"
 let bool = "true" | "false"
 let letter = ['a'-'z' 'A'-'Z']
 let id = (letter | '_') (letter | digit | '_')*
-let inline_comment = "//" [^ '\n']* newline
+let inline_comment = "//" [^ '\n']* newline | "#" [^ '\n']* newline

 rule read =
   parse

Updating the sample file to cover the python-style comment:

diff --git a/samples/comments/comments.jsonnet b/samples/comments/comments.jsonnet
index f040fa0..d154ad2 100644
--- a/samples/comments/comments.jsonnet
+++ b/samples/comments/comments.jsonnet
@@ -4,3 +4,4 @@ Now...
 Tell me something I don't know!
 ¬¬
 */
+# I'm a Python-style comment

Single-quoted strings

Single-quoted string is also straightforward. We copy and paste the double-quoted code and adapt the opening and closing with single quote:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index 51e3b0e..d2ea585 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -28,7 +28,8 @@ rule read =
   | float { FLOAT (float_of_string (Lexing.lexeme lexbuf)) }
   | null { NULL }
   | bool { BOOL (bool_of_string (Lexing.lexeme lexbuf)) }
-  | '"' { read_string (Buffer.create 16) lexbuf }
+  | '"' { read_double_quoted_string (Buffer.create 16) lexbuf }
+  | '\'' { read_single_quoted_string (Buffer.create 16) lexbuf }
   | '[' { LEFT_SQR_BRACKET }
   | ']' { RIGHT_SQR_BRACKET }
   | '{' { LEFT_CURLY_BRACKET }
@@ -49,19 +50,39 @@ rule read =
   | id { ID (Lexing.lexeme lexbuf) }
   | _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }
   | eof { EOF }
-and read_string buf =
+and read_double_quoted_string buf =
   parse
+  | '\\' '"'  { Buffer.add_char buf '"'; read_double_quoted_string buf lexbuf }
+  | '\\' '\''  { Buffer.add_char buf '\''; read_double_quoted_string buf lexbuf }
+  | '\\' '/'  { Buffer.add_char buf '/'; read_double_quoted_string buf lexbuf }
+  | '\\' '\\' { Buffer.add_char buf '\\'; read_double_quoted_string buf lexbuf }
+  | '\\' 'b'  { Buffer.add_char buf '\b'; read_double_quoted_string buf lexbuf }
+  | '\\' 'f'  { Buffer.add_char buf '\012'; read_double_quoted_string buf lexbuf }
+  | '\\' 'n'  { Buffer.add_char buf '\n'; read_double_quoted_string buf lexbuf }
+  | '\\' 'r'  { Buffer.add_char buf '\r'; read_double_quoted_string buf lexbuf }
+  | '\\' 't'  { Buffer.add_char buf '\t'; read_double_quoted_string buf lexbuf }
   | '"' { STRING (Buffer.contents buf) }
-  | '\\' '/'  { Buffer.add_char buf '/'; read_string buf lexbuf }
-  | '\\' '\\' { Buffer.add_char buf '\\'; read_string buf lexbuf }
-  | '\\' 'b'  { Buffer.add_char buf '\b'; read_string buf lexbuf }
-  | '\\' 'f'  { Buffer.add_char buf '\012'; read_string buf lexbuf }
-  | '\\' 'n'  { Buffer.add_char buf '\n'; read_string buf lexbuf }
-  | '\\' 'r'  { Buffer.add_char buf '\r'; read_string buf lexbuf }
-  | '\\' 't'  { Buffer.add_char buf '\t'; read_string buf lexbuf }
   | [^ '"' '\\']+
     { Buffer.add_string buf (Lexing.lexeme lexbuf);
-      read_string buf lexbuf
+      read_double_quoted_string buf lexbuf
+    }
+  | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
+  | eof { raise (SyntaxError ("String is not terminated")) }
+and read_single_quoted_string buf =
+  parse
+  | '\\' '"'  { Buffer.add_char buf '"'; read_single_quoted_string buf lexbuf }
+  | '\\' '\''  { Buffer.add_char buf '\''; read_single_quoted_string buf lexbuf }
+  | '\\' '/'  { Buffer.add_char buf '/'; read_single_quoted_string buf lexbuf }
+  | '\\' '\\' { Buffer.add_char buf '\\'; read_single_quoted_string buf lexbuf }
+  | '\\' 'b'  { Buffer.add_char buf '\b'; read_single_quoted_string buf lexbuf }
+  | '\\' 'f'  { Buffer.add_char buf '\012'; read_single_quoted_string buf lexbuf }
+  | '\\' 'n'  { Buffer.add_char buf '\n'; read_single_quoted_string buf lexbuf }
+  | '\\' 'r'  { Buffer.add_char buf '\r'; read_single_quoted_string buf lexbuf }
+  | '\\' 't'  { Buffer.add_char buf '\t'; read_single_quoted_string buf lexbuf }
+  | '\'' { STRING (Buffer.contents buf) }
+  | [^ '\'' '\\']+
+    { Buffer.add_string buf (Lexing.lexeme lexbuf);
+      read_single_quoted_string buf lexbuf
     }
   | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
   | eof { raise (SyntaxError ("String is not terminated")) }

Updating the sample files to follow:

diff --git a/samples/literals/string.jsonnet b/samples/literals/string.jsonnet
index 8effb3e..52bdb4f 100644
--- a/samples/literals/string.jsonnet
+++ b/samples/literals/string.jsonnet
@@ -1 +1 @@
-"Hello, world!"
+"Hello, world! Here's \"Tsonnet\"."
diff --git a/samples/literals/string_single_quote.jsonnet b/samples/literals/string_single_quote.jsonnet
new file mode 100644
index 0000000..7dfd97e
--- /dev/null
+++ b/samples/literals/string_single_quote.jsonnet
@@ -0,0 +1 @@
+'Hello, world! Here\'s \"Tsonnet\".'
diff --git a/test/cram/literals.t b/test/cram/literals.t
index c7b1c2a..91a4786 100644
--- a/test/cram/literals.t
+++ b/test/cram/literals.t
@@ -20,7 +20,10 @@
   null

   $ tsonnet ../../samples/literals/string.jsonnet
-  "Hello, world!"
+  "Hello, world! Here's \"Tsonnet\"."
+
+  $ tsonnet ../../samples/literals/string_single_quote.jsonnet
+  "Hello, world! Here's \"Tsonnet\"."

   $ tsonnet ../../samples/literals/array.jsonnet
   [ 1, 2.0, "hi", null ]

Trailing commas

Trailing commas are allowed for objects and arrays. The parser needs a few tweaks to account for that:

diff --git a/lib/parser.mly b/lib/parser.mly
index bf88591..91fac9a 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -65,20 +65,30 @@ literal:
   | b = BOOL { Bool (with_pos $startpos $endpos, b) }
   | s = STRING { String (with_pos $startpos $endpos, s) }
   | id = ID { Ident (with_pos $startpos $endpos, id) }
-  | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array (with_pos $startpos $endpos, values) }
-  | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object (with_pos $startpos $endpos, attrs) }
+  | LEFT_SQR_BRACKET; values = array_field_list; RIGHT_SQR_BRACKET { Array (with_pos $startpos $endpos, values) }
+  | LEFT_CURLY_BRACKET; attrs = obj_field_list; RIGHT_CURLY_BRACKET { Object (with_pos $startpos $endpos, attrs) }
   ;

-list_fields:
-  vl = separated_list(COMMA, assignable_expr) { vl };
+array_field_list:
+  | { [] }
+  | e = assignable_expr { [e] }
+  | e = assignable_expr; COMMA; es = array_field_list { e :: es }
+  ;
+
+obj_key:
+  | k = STRING { k }
+  | k = ID { k }
+  ;

 obj_field:
-  | k = STRING; COLON; e = assignable_expr { (k, e) }
-  | k = ID; COLON; e = assignable_expr { (k, e) }
+  | k = obj_key; COLON; e = assignable_expr { (k, e) }
   ;

-obj_fields:
-  obj = separated_list(COMMA, obj_field) { obj };
+obj_field_list:
+  | { [] }
+  | obj_field { [$1] }
+  | f = obj_field; COMMA; fs = obj_field_list { f :: fs }
+  ;

 %inline number:
   | i = INT { Int i }

The helper function separate_list had to be refactored to use pattern matching here. This was required to avoid an ambiguous grammar.

Updating sample files to follow:

diff --git a/samples/literals/array.jsonnet b/samples/literals/array.jsonnet
index e28fb21..a6043d7 100644
--- a/samples/literals/array.jsonnet
+++ b/samples/literals/array.jsonnet
@@ -1 +1,3 @@
-[1, 2.0, "hi", null]
+[1, 2.0, "hi",
+null, // accepts trailing comma
+]
diff --git a/samples/literals/object.jsonnet b/samples/literals/object.jsonnet
index cb4c52a..f9d9629 100644
--- a/samples/literals/object.jsonnet
+++ b/samples/literals/object.jsonnet
@@ -4,5 +4,5 @@
     "string_attr": "Hello, world!",
     null_attr: null,
     array_attr: [1, false, {}],
-    obj_attr: { "a": true, "b": false, "c": { "d": [42] } }
+    obj_attr: { "a": true, "b": false, "c": { "d": [42] } }, // accepts trailing comma
 }

Verbatim strings

Verbatim strings (also called raw strings or literal strings) are string literals where escape sequences are not processed -- what you see is exactly what you get.

The lexer rule for accepting text blocks with ||| is a simplified version of the single/double-quoted string rule:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index d2ea585..b6abe11 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -3,6 +3,12 @@
   open Lexing
   open Parser
   exception SyntaxError of string
+
+  let verbatim_string s =
+    (String.split_on_char '\n' s)
+    |> List.drop_while (fun line -> line = "")
+    |> List.map String.trim
+    |> String.concat "\n"
 }

 let white = [' ' '\t']+
@@ -30,6 +36,7 @@ rule read =
   | bool { BOOL (bool_of_string (Lexing.lexeme lexbuf)) }
   | '"' { read_double_quoted_string (Buffer.create 16) lexbuf }
   | '\'' { read_single_quoted_string (Buffer.create 16) lexbuf }
+  | "|||" { read_verbatim_string (Buffer.create 16) lexbuf }
   | '[' { LEFT_SQR_BRACKET }
   | ']' { RIGHT_SQR_BRACKET }
   | '{' { LEFT_CURLY_BRACKET }
@@ -86,6 +93,12 @@ and read_single_quoted_string buf =
     }
   | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
   | eof { raise (SyntaxError ("String is not terminated")) }
+and read_verbatim_string buf =
+  parse
+  | "|||" { STRING (verbatim_string (Buffer.contents buf)) }
+  | _ as c  { Buffer.add_char buf c; read_verbatim_string buf lexbuf }
+  | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
+  | eof { raise (SyntaxError ("String is not terminated")) }
 and block_comment =
   parse
   | "*/" { read lexbuf }

Note that the verbatim_string function processes the raw content by trimming leading empty lines and normalizing whitespace, which matches Jsonnet's behavior.

Adding new samples and testing them:

diff --git a/samples/errors/malformed_verbatim_string.jsonnet b/samples/errors/malformed_verbatim_string.jsonnet
new file mode 100644
index 0000000..c2b2d8a
--- /dev/null
+++ b/samples/errors/malformed_verbatim_string.jsonnet
@@ -0,0 +1,3 @@
+local s = |||
+;
+s
diff --git a/samples/literals/string_raw.jsonnet b/samples/literals/string_raw.jsonnet
new file mode 100644
index 0000000..3ccfa66
--- /dev/null
+++ b/samples/literals/string_raw.jsonnet
@@ -0,0 +1,7 @@
+local s = |||
+    Hi stranger, this is a
+    multi-line verbatim string,
+    also called raw-string or
+    literal string.
+|||;
+s
diff --git a/test/cram/errors.t b/test/cram/errors.t
index 0fcc7ad..878f0c6 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -12,6 +12,13 @@
      ^^^^^^^^^^^^^^^^^^^^^
   [1]

+  $ tsonnet ../../samples/errors/malformed_verbatim_string.jsonnet
+  ../../samples/errors/malformed_verbatim_string.jsonnet:1:18 String is not terminated
+  
+  1: local s = |||
+     ^^^^^^^^^^^^^
+  [1]
+
   $ tsonnet ../../samples/errors/sum_int_to_boolean.jsonnet
   ../../samples/errors/sum_int_to_boolean.jsonnet:1:0 Invalid binary operation

diff --git a/test/cram/literals.t b/test/cram/literals.t
index 91a4786..4a919d5 100644
--- a/test/cram/literals.t
+++ b/test/cram/literals.t
@@ -25,6 +25,9 @@
   $ tsonnet ../../samples/literals/string_single_quote.jsonnet
   "Hello, world! Here's \"Tsonnet\"."

+  $ tsonnet ../../samples/literals/string_raw.jsonnet
+  "Hi stranger, this is a\nmulti-line verbatim string,\nalso called raw-string or\nliteral string.\n"
+
   $ tsonnet ../../samples/literals/array.jsonnet
   [ 1, 2.0, "hi", null ]

Single-line verbatim strings:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index b6abe11..92b4d58 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -4,6 +4,10 @@
   open Parser
   exception SyntaxError of string

+  let string_not_terminated = SyntaxError ("String is not terminated")
+
+  let illegal_string_char invalid = SyntaxError ("Illegal string character: " ^ invalid)
+
   let verbatim_string s =
     (String.split_on_char '\n' s)
     |> List.drop_while (fun line -> line = "")
@@ -37,6 +41,8 @@ rule read =
   | '"' { read_double_quoted_string (Buffer.create 16) lexbuf }
   | '\'' { read_single_quoted_string (Buffer.create 16) lexbuf }
   | "|||" { read_verbatim_string (Buffer.create 16) lexbuf }
+  | "@\"" { read_single_line_verbatim_double_quoted_string (Buffer.create 16) lexbuf }
+  | "@'" { read_single_line_verbatim_single_quoted_string (Buffer.create 16) lexbuf }
   | '[' { LEFT_SQR_BRACKET }
   | ']' { RIGHT_SQR_BRACKET }
   | '{' { LEFT_CURLY_BRACKET }
@@ -73,8 +79,8 @@ and read_double_quoted_string buf =
     { Buffer.add_string buf (Lexing.lexeme lexbuf);
       read_double_quoted_string buf lexbuf
     }
-  | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
-  | eof { raise (SyntaxError ("String is not terminated")) }
+  | _ { raise (illegal_string_char (Lexing.lexeme lexbuf)) }
+  | eof { raise string_not_terminated }
 and read_single_quoted_string buf =
   parse
   | '\\' '"'  { Buffer.add_char buf '"'; read_single_quoted_string buf lexbuf }
@@ -91,14 +97,26 @@ and read_single_quoted_string buf =
     { Buffer.add_string buf (Lexing.lexeme lexbuf);
       read_single_quoted_string buf lexbuf
     }
-  | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
-  | eof { raise (SyntaxError ("String is not terminated")) }
+  | _ { raise (illegal_string_char (Lexing.lexeme lexbuf)) }
+  | eof { raise string_not_terminated }
+and read_single_line_verbatim_single_quoted_string buf =
+  parse
+  | '\'' { STRING (verbatim_string (Buffer.contents buf)) }
+  | _ as c  { Buffer.add_char buf c; read_single_line_verbatim_single_quoted_string buf lexbuf }
+  | _ { raise (illegal_string_char (Lexing.lexeme lexbuf)) }
+  | eof { raise string_not_terminated }
+and read_single_line_verbatim_double_quoted_string buf =
+  parse
+  | '"' { STRING (verbatim_string (Buffer.contents buf)) }
+  | _ as c  { Buffer.add_char buf c; read_single_line_verbatim_double_quoted_string buf lexbuf }
+  | _ { raise (illegal_string_char (Lexing.lexeme lexbuf)) }
+  | eof { raise string_not_terminated }
 and read_verbatim_string buf =
   parse
   | "|||" { STRING (verbatim_string (Buffer.contents buf)) }
   | _ as c  { Buffer.add_char buf c; read_verbatim_string buf lexbuf }
-  | _ { raise (SyntaxError ("Illegal string character: " ^ Lexing.lexeme lexbuf)) }
-  | eof { raise (SyntaxError ("String is not terminated")) }
+  | _ { raise (illegal_string_char (Lexing.lexeme lexbuf)) }
+  | eof { raise string_not_terminated }
 and block_comment =
   parse
   | "*/" { read lexbuf }

Adding new sample files and testing them:

diff --git a/samples/errors/malformed_verbatim_string_single_line.jsonnet b/samples/errors/malformed_verbatim_string_single_line.jsonnet
new file mode 100644
index 0000000..3ed0953
--- /dev/null
+++ b/samples/errors/malformed_verbatim_string_single_line.jsonnet
@@ -0,0 +1,2 @@
+local s = @'hello...;
+s
diff --git a/samples/literals/string_raw_single_line.jsonnet b/samples/literals/string_raw_single_line.jsonnet
new file mode 100644
index 0000000..96d772b
--- /dev/null
+++ b/samples/literals/string_raw_single_line.jsonnet
@@ -0,0 +1,2 @@
+local s = @"Hello, stranger!";
+s
diff --git a/samples/literals/string_raw_single_line_single_quote.jsonnet b/samples/literals/string_raw_single_line_single_quote.jsonnet
new file mode 100644
index 0000000..852e280
--- /dev/null
+++ b/samples/literals/string_raw_single_line_single_quote.jsonnet
@@ -0,0 +1,2 @@
+local s = @'Hello, stranger!';
+s
diff --git a/test/cram/errors.t b/test/cram/errors.t
index 878f0c6..96e0a91 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -19,6 +19,13 @@
      ^^^^^^^^^^^^^
   [1]

+  $ tsonnet ../../samples/errors/malformed_verbatim_string_single_line.jsonnet
+  ../../samples/errors/malformed_verbatim_string_single_line.jsonnet:1:24 String is not terminated
+  
+  1: local s = @'hello...;
+     ^^^^^^^^^^^^^^^^^^^^^
+  [1]
+
   $ tsonnet ../../samples/errors/sum_int_to_boolean.jsonnet
   ../../samples/errors/sum_int_to_boolean.jsonnet:1:0 Invalid binary operation

diff --git a/test/cram/literals.t b/test/cram/literals.t
index 4a919d5..6b8d177 100644
--- a/test/cram/literals.t
+++ b/test/cram/literals.t
@@ -28,6 +28,12 @@
   $ tsonnet ../../samples/literals/string_raw.jsonnet
   "Hi stranger, this is a\nmulti-line verbatim string,\nalso called raw-string or\nliteral string.\n"

+  $ tsonnet ../../samples/literals/string_raw_single_line.jsonnet
+  "Hello, stranger!"
+
+  $ tsonnet ../../samples/literals/string_raw_single_line_single_quote.jsonnet
+  "Hello, stranger!"
+
   $ tsonnet ../../samples/literals/array.jsonnet
   [ 1, 2.0, "hi", null ]

Putting it all together

Now let's test our enhanced Tsonnet with the original syntax.jsonnet example:

$ dune exec -- tsonnet samples/tutorials/syntax.jsonnet
{
  "cocktails": {
    "Tom Collins": {
      "ingredients": [
        { "kind": "Farmer's Gin", "qty": 1.5 },
        { "kind": "Lemon", "qty": 1 },
        { "kind": "Simple Syrup", "qty": 0.5 },
        { "kind": "Soda", "qty": 2 },
        { "kind": "Angostura", "qty": "dash" }
      ],
      "garnish": "Maraschino Cherry",
      "served": "Tall",
      "description": "The Tom Collins is essentially gin and\nlemonade.  The bitters add complexity.\n"
    },
    "Manhattan": {
      "ingredients": [
        { "kind": "Rye", "qty": 2.5 },
        { "kind": "Sweet Red Vermouth", "qty": 1 },
        { "kind": "Angostura", "qty": "dash" }
      ],
      "garnish": "Maraschino Cherry",
      "served": "Straight Up",
      "description": "A clear \\ red drink."
    }
  }
}

Voilà!

With these additions, Tsonnet now supports a much richer set of Jsonnet syntax features, bringing us closer to full compatibility. These syntactic sugar features might seem small individually, but they're crucial for real-world usability -- after all, who wants to escape every quote in a complex string or worry about trailing commas breaking their configuration?

The journey from a basic interpreter to a fully-featured configuration language continues, one commit at a time.

Thanks for reading Bit Maybe Wise! Subscribe to build a configuration language that's almost as sweet as the cocktails in the examples.

Photo by Myriam Zilles on Unsplash

Hercules Lemke Merscher @bitmaybewise