Tsonnet #14 - Where’s Wally? Adding parsing error tracing

Welcome to the Tsonnet series!

If you're just joining, you can check out how it all started in the first post of the series.

In the previous post, we added lexing error tracing.

This time, we will expand it and cover parsing error tracing.

Photo by BBiDDac on Unsplash

A parsing error

Let’s add an extremely simple sample file introducing a parsing error:

diff --git a/samples/errors/sum_int_to_boolean.jsonnet b/samples/errors/sum_int_to_boolean.jsonnet
new file mode 100644
index 0000000..571e0b4
--- /dev/null
+++ b/samples/errors/sum_int_to_boolean.jsonnet
@@ -0,0 +1 @@
+42 + true

Even though this example is pretty simple, it serves us well to pinpoint the parsing error. We have a number and a boolean, which are lexed correctly, but the expression is invalid; we can’t add a number to a boolean.

Jsonnet highlights row and columns correctly, but there’s no visual cue to help us:

$ jsonnet samples/errors/sum_int_to_boolean.jsonnet
RUNTIME ERROR: Unexpected type boolean, expected number
    samples/errors/sum_int_to_boolean.jsonnet:1:1-10    $
    During evaluation

But we want to do better at this front and also add human-friendly error messages to Tsonnet.

Let’s also stretch a little bit and add a multiline sample file:

diff --git a/samples/errors/sum_int_to_boolean_multiline.jsonnet b/samples/errors/sum_int_to_boolean_multiline.jsonnet
new file mode 100644
index 0000000..06d20e6
--- /dev/null
+++ b/samples/errors/sum_int_to_boolean_multiline.jsonnet
@@ -0,0 +1,4 @@
+(1+1+1+1)+
+2+2+2+2+
+3+3+3+false+
+4+4+4+4

Considering that a parsing error can arise at any part of the entirely valid lexed file, we need to make sure that parser errors are captured at the right position.

Now that we have samples to test against, we need a way to capture where each token was found. Let’s see how to capture parsing positions.

Capturing parsing positions

The first file we need to update is the ast.ml. We need to capture the position in the AST:

diff --git a/lib/ast.ml b/lib/ast.ml
index 00ead06..ed830ad 100644
--- a/lib/ast.ml
+++ b/lib/ast.ml
@@ -14,13 +14,28 @@ type number =
   | Int of int
   | Float of float

-type expr =
+type value =
   | Number of number
   | Null
   | Bool of bool
   | String of string
   | Ident of string
-  | Array of expr list
-  | Object of (string * expr) list
-  | BinOp of bin_op * expr * expr
-  | UnaryOp of unary_op * expr
+  | Array of value list
+  | Object of (string * value) list
+  | BinOp of bin_op * value * value
+  | UnaryOp of unary_op * value
+
+type position = {
+  startpos: Lexing.position;
+  endpos: Lexing.position;
+}
+
+type expr = {
+  position: position;
+  value: value;
+}
+
+let pos_from_lexbuf (lexbuf : Lexing.lexbuf) : position =
+  { startpos = lexbuf.lex_curr_p;
+    endpos = lexbuf.lex_curr_p;
+  };

It is important to have both the start and end positions of the expression. The new position type should be part of the expr.

So far, expr was a sum type containing all the value variants that could be found in the code. Now, it will be the product type that contains position and value.

The pos_from_lexbuf function will come in handy to capture the position for lexing errors.

Now we need to do some refactoring-driven-development to update everywhere referencing the expr to use value where it’s due.

The json.ml change is straightforward — only replace expr references to value:

diff --git a/lib/json.ml b/lib/json.ml
index 26d930c..cd65ec0 100644
--- a/lib/json.ml
+++ b/lib/json.ml
@@ -1,7 +1,7 @@
 open Ast
 open Result

-let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
+let rec value_to_yojson : Ast.value -> (Yojson.t, string) result = function
   | Number n ->
     ok (match n with
     | Int i -> `Int i
@@ -11,12 +11,12 @@ let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
   | String s -> ok (`String s)
   | Ident id -> ok (`String id)
   | Array values ->
-    let expr_to_list expr' = to_list (expr_to_yojson expr') in
+    let expr_to_list expr' = to_list (value_to_yojson expr') in
     let results = values |> List.map expr_to_list |> List.concat in
     ok (`List results)
   | Object attrs ->
     let eval' = fun (k, v) ->
-      let result = expr_to_yojson v
+      let result = value_to_yojson v
       in Result.map (fun val' -> (k, val')) result
     in
     let results = attrs |> List.map eval' |> List.map to_list |> List.concat
@@ -24,5 +24,5 @@ let rec expr_to_yojson : expr -> (Yojson.t, string) result = function
   | _ -> error "value type not representable as JSON"

 let expr_to_string expr =
-  let yojson = expr_to_yojson expr
+  let yojson = value_to_yojson expr.value
   in Result.map Yojson.pretty_to_string yojson

Now, let’s see how to capture the positions in the parser.

Parsing positions

Menhir provides some helpers, and the $startpos and $endpos are the ones we are interested in, which return a Lexing.position.

We can wrap the record building in the with_pos function to make the parser cleaner. But we’ll need to update every single rule in expr to use it:

diff --git a/lib/parser.mly b/lib/parser.mly
index 0167984..6292eef 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -1,5 +1,14 @@
 %{
   [@@@coverage exclude_file]
+  open Ast
+
+  let with_pos startpos endpos value : Ast.expr = {
+    position = {
+      startpos = startpos;
+      endpos = endpos;
+    };
+    value = value
+  }
 %}

 %token <int> INT
@@ -32,31 +41,34 @@ prog:
   ;

 expr:
-  | i = INT { Number (Int i) }
-  | f = FLOAT { Number (Float f) }
-  | NULL { Null }
-  | b = BOOL { Bool b }
-  | s = STRING { String s }
-  | id = ID { Ident id }
-  | LEFT_PAREN; e = expr; RIGHT_PAREN { e }
-  | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array values }
-  | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object attrs }
-  | e1 = expr; PLUS; e2 = expr { BinOp (Add, e1, e2) }
-  | e1 = expr; MINUS; e2 = expr { BinOp (Subtract, e1, e2) }
-  | e1 = expr; MULTIPLY; e2 = expr { BinOp (Multiply, e1, e2) }
-  | e1 = expr; DIVIDE; e2 = expr { BinOp (Divide, e1, e2) }
-  | PLUS; e = expr; { UnaryOp (Plus, e) }
-  | MINUS; e = expr; { UnaryOp (Minus, e) }
-  | NOT; e = expr; { UnaryOp (Not, e) }
-  | BITWISE_NOT; e = expr; { UnaryOp (BitwiseNot, e) }
+  | i = INT { with_pos $startpos $endpos (Number (Int i)) }
+  | f = FLOAT { with_pos $startpos $endpos (Number (Float f)) }
+  | NULL { with_pos $startpos $endpos Null }
+  | b = BOOL { with_pos $startpos $endpos (Bool b) }
+  | s = STRING { with_pos $startpos $endpos (String s) }
+  | id = ID { with_pos $startpos $endpos (Ident id) }
+  | LEFT_PAREN; e = expr; RIGHT_PAREN { with_pos $startpos $endpos e.value }
+  | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { with_pos $startpos $endpos (Array values) }
+  | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { with_pos $startpos $endpos (Object attrs) }
+  | e1 = expr; PLUS; e2 = expr { with_pos $startpos $endpos (BinOp (Add, e1.value, e2.value)) }
+  | e1 = expr; MINUS; e2 = expr { with_pos $startpos $endpos (BinOp (Subtract, e1.value, e2.value)) }
+  | e1 = expr; MULTIPLY; e2 = expr { with_pos $startpos $endpos (BinOp (Multiply, e1.value, e2.value)) }
+  | e1 = expr; DIVIDE; e2 = expr { with_pos $startpos $endpos (BinOp (Divide, e1.value, e2.value)) }
+  | PLUS; e = expr; { with_pos $startpos $endpos (UnaryOp (Plus, e.value)) }
+  | MINUS; e = expr; { with_pos $startpos $endpos (UnaryOp (Minus, e.value)) }
+  | NOT; e = expr; { with_pos $startpos $endpos (UnaryOp (Not, e.value)) }
+  | BITWISE_NOT; e = expr; { with_pos $startpos $endpos (UnaryOp (BitwiseNot, e.value)) }
   ;

+list_value:
+  e = expr { e.value };
+
 list_fields:
-  vl = separated_list(COMMA, expr) { vl };
+  vl = separated_list(COMMA, list_value) { vl };

 obj_field:
-  | k = STRING; COLON; v = expr { (k, v) }
-  | k = ID; COLON; v = expr { (k, v) }
+  | k = STRING; COLON; e = expr { (k, e.value) }
+  | k = ID; COLON; e = expr { (k, e.value) }
   ;

 obj_fields:

Helper rules such as list_value, list_fields, obj_field, and obj_fields do not need to include the position. At least, not yet. Maybe soon, when the type checker is implemented, we’ll need it to highlight errors for the expression’s inner tokens. Let’s postpone this to the time it is needed, if it is needed at all.

The parser had many lines changed, but it couldn’t be simpler than that.

The error handling is being exercised by the interpreter until now, but it will be better to isolate it in its own module. Let’s do it before continuing.

Introducing the error module

With an error.ml module, we can encapsulate all things related to error formatting. It makes things easier to reason about in a contained context:

open Ast
open Result

let enumerate_error_lines filename position ~highlight_error =
  let channel = open_in filename in
  try
    let eaten_chars = ref 0 in

    let rec read_lines acc line_num =
      try
        let line = input_line channel in
        let numbered_line = Printf.sprintf "%d: %s" line_num line in

        let is_error_line = line_num >= position.startpos.pos_lnum && line_num <= position.endpos.pos_lnum in
        let eat_more_chars = position.endpos.pos_cnum > !eaten_chars in

        if is_error_line && eat_more_chars then
          read_lines (highlight_error line_num line position eaten_chars :: numbered_line :: acc) (line_num + 1)
        else
          read_lines acc (line_num + 1)
      with End_of_file -> List.rev acc
    in
    let numbered_lines = read_lines [] 1 in
    close_in channel;
    ok (String.concat "\n" numbered_lines)
  with e ->
    close_in_noerr channel;
    error (Printexc.to_string e)

let plot_caret line_num line pos (eaten_chars: int ref) =
  let line_length = String.length line in
  let left_padding = String.length (string_of_int line_num) + 2 in (* width of line number + colon + space, e.g.: "%d: "... *)
  let buf_size = left_padding + line_length in
  let buffer = Buffer.create buf_size in

  let start' = min pos.startpos.pos_cnum !eaten_chars in
  let end' = min pos.endpos.pos_cnum (line_length-1) in

  (* Fill with spaces for the left padding (line number area) *)
  for _ = 1 to left_padding do
    Buffer.add_char buffer ' ';
    incr eaten_chars;
  done;
  (* Add spaces up to the starting position *)
  for _ = 1 to start' do
    Buffer.add_char buffer ' ';
    incr eaten_chars;
  done;
  (* Add caret to highlight expr between start and end columns *)
  for _ = start' to end' do
    Buffer.add_char buffer '^';
    incr eaten_chars;
  done;

  incr eaten_chars; (* +1 for newline *)

  let carets = Buffer.contents buffer in
  (Printf.sprintf "%*s" left_padding carets)

let trace_file_position err pos =
  let start_col = pos.startpos.pos_cnum - pos.startpos.pos_bol in
  Printf.sprintf "%s:%d:%d %s\n" pos.startpos.pos_fname pos.startpos.pos_lnum start_col err

let trace (err: string) (pos: position) : (string, string) result =
  bind
    (enumerate_error_lines pos.startpos.pos_fname pos ~highlight_error: plot_caret)
    (fun content -> ok (Printf.sprintf "%s\n%s" (trace_file_position err pos) content))

The error.mli interface file has been added to expose only the trace function from the Error module:

val trace : string -> Ast.position -> (string, string) result

Some new things have been introduced here, compared to the previous format_error function:

It handles multiline files.
We are skipping lines without error. Showing non-relevant lines only adds noise.
It has a clever logic to highlight the errors from the expr’s start column till the end column.

Cool, huh?!

Let’s proceed to connect the pieces.

Connecting all the pieces

The interpreter needs a few tweaks:

The parse function needs to be refactored to use the new Error.trace function.
Functions such as interpret_arith_op, interpret_unary_op, and interpret need to be changed to deal with the value from expr.

The diff might be daunting to stare at, but its complexity hasn’t increased much:

diff --git a/lib/tsonnet.ml b/lib/tsonnet.ml
index e8f6304..3907770 100644
--- a/lib/tsonnet.ml
+++ b/lib/tsonnet.ml
@@ -4,49 +4,6 @@ open Result
 let (let*) = Result.bind
 let (>>=) = Result.bind

-let enumerate_file_content filename =
-  let channel = open_in filename in
-  try
-    let rec read_lines acc line_num =
-      try
-        let line = input_line channel in
-        let numbered_line = Printf.sprintf "%d %s" line_num line in
-        read_lines (numbered_line :: acc) (line_num + 1)
-      with End_of_file -> (List.rev acc, line_num)
-    in
-    let numbered_lines, line_num = read_lines [] 1 in
-    close_in channel;
-    ok (String.concat "\n" numbered_lines, line_num)
-  with e ->
-    close_in_noerr channel;
-    error (Printexc.to_string e)
-
-let plot_caret column_size =
-  if column_size <= 0 then
-    ""
-  else
-    let buffer = Buffer.create column_size in
-    (* Fill with spaces except the last position *)
-    for _ = 1 to column_size - 1 do
-      Buffer.add_char buffer ' '
-    done;
-    (* Add caret at the end *)
-    Buffer.add_char buffer '^';
-    Buffer.contents buffer
-
-let format_error (err: string) (lexbuf: Lexing.lexbuf) : (string, string) result =
-  let* content, n = enumerate_file_content lexbuf.lex_curr_p.pos_fname in
-  let pos_cnum = lexbuf.lex_curr_p.pos_cnum - lexbuf.lex_curr_p.pos_bol in
-  let carot_padding = String.length (string_of_int n) + 1 in
-  ok (Printf.sprintf "%s:%d:%d %s\n\n%s\n %*s"
-    lexbuf.lex_curr_p.pos_fname
-    lexbuf.lex_curr_p.pos_lnum
-    pos_cnum
-    err
-    content
-    carot_padding (plot_caret pos_cnum)
-  )
-
 (** [parse s] parses [s] into an AST. *)
 let parse (filename: string) : (expr, string) result  =
   let input = open_in filename in
@@ -54,12 +11,12 @@ let parse (filename: string) : (expr, string) result  =
   Lexing.set_filename lexbuf filename;
   let result =
     try ok (Parser.prog Lexer.read lexbuf)
-    with | Lexer.SyntaxError err -> (format_error err lexbuf) >>= error
+    with | Lexer.SyntaxError err -> (Error.trace err (Ast.pos_from_lexbuf lexbuf)) >>= error
   in
   close_in input;
   result

-let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : expr =
+let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : Ast.value =
   match op, n1, n2 with
   | Add, (Int a), (Int b) -> Number (Int (a + b))
   | Add, (Float a), (Int b) -> Number (Float (a +. (float_of_int b)))
@@ -79,38 +36,48 @@ let interpret_arith_op (op: bin_op) (n1: number) (n2: number) : expr =
   | Divide, (Float a), (Float b) -> Number (Float (a /. b))

 let interpret_concat_op (e1 : expr) (e2 : expr) : (expr, string) result =
-  match e1, e2 with
-  | String s1, String s2 -> ok (String (s1^s2))
-  | String s1, expr2 ->
-    let* s2 = Json.expr_to_string expr2 in
-    ok (String (s1^s2))
-  | expr1, String s2 ->
-    let* s1 = Json.expr_to_string expr1 in
-    ok (String (s1^s2))
-  | _ -> error "invalid string concatenation operation"
+  let value =
+    match e1.value, e2.value with
+    | String s1, String s2 -> ok (String (s1^s2))
+    | String s1, val2 ->
+      let* s2 = Json.expr_to_string {e2 with value = val2} in
+      ok (String (s1^s2))
+    | val1, String s2 ->
+      let* s1 = Json.expr_to_string {e1 with value = val1} in
+      ok (String (s1^s2))
+    | _ -> error "invalid string concatenation operation"
+    in Result.map (fun v -> { e1 with value = v }) value

 let interpret_unary_op (op: unary_op) (evaluated_expr: expr)  =
-    match op, evaluated_expr with
-    | Plus, number -> ok number
-    | Minus, Number (Int i) -> ok (Number (Int (-i)))
-    | Minus, Number (Float f) -> ok (Number (Float (-. f)))
-    | Not, (Bool b) -> ok (Bool (not b))
-    | BitwiseNot, Number (Int i) -> ok (Number (Int (lnot i)))
-    | _ -> error "invalid unary operation"
+    let* new_value = (match op, evaluated_expr.value with
+                    | Plus, number -> ok number
+                    | Minus, Number (Int i) -> ok (Number (Int (-i)))
+                    | Minus, Number (Float f) -> ok (Number (Float (-. f)))
+                    | Not, (Bool b) -> ok (Bool (not b))
+                    | BitwiseNot, Number (Int i) -> ok (Number (Int (lnot i)))
+                    | _ -> error "invalid unary operation")
+    in ok { evaluated_expr with value = new_value }

 (** [interpret expr] interprets and reduce the intermediate AST [expr] into a result AST. *)
-let rec interpret (e: expr) : (expr, string) result =
-  match e with
-  | Null | Bool _ | String _ | Number _ | Array _ | Object _ | Ident _ -> ok e
+let rec interpret expr : (expr, string) result =
+  match expr.value with
+  | Null | Bool _ | String _ | Number _ | Array _ | Object _ | Ident _ -> ok expr
   | BinOp (op, e1, e2) ->
-    (let* e1' = interpret e1 in
-    let* e2' = interpret e2 in
-    match op, e1', e2' with
-    | Add, (String _ as expr1), (_ as expr2) | Add, (_ as expr1), (String _ as expr2) ->
-      interpret_concat_op expr1 expr2
-    | _, Number v1, Number v2 -> ok (interpret_arith_op op v1 v2)
-    | _ -> error "invalid binary operation")
-  | UnaryOp (op, expr) -> interpret expr >>= interpret_unary_op op
+    (let* e1' = interpret ({expr with value = e1}) in
+    let* e2' = interpret ({expr with value = e2}) in
+    match op, e1'.value, e2'.value with
+    | Add, (String _ as v1), (_ as v2) | Add, (_ as v1), (String _ as v2) ->
+      let expr1 = { expr with value = v1 }
+      in let expr2 = { expr with value = v2 }
+      in interpret_concat_op expr1 expr2
+    | _, Number v1, Number v2 ->
+      let value = interpret_arith_op op v1 v2
+      in ok ({expr with value = value})
+    | _ ->
+      Error.trace "invalid binary operation" expr.position >>= error
+    )
+  | UnaryOp (op, value) ->
+    interpret ({expr with value = value}) >>= interpret_unary_op op

 let run (filename: string) : (string, string) result =
   parse filename >>= interpret >>= Json.expr_to_string

Let’s update the tests:

Add a new cram test against samples/errors/sum_int_to_boolean.jsonnet.
Add a new cram test against samples/errors/sum_int_to_boolean_multiline.jsonnet.
Run dune runtest.
Run dune promote.

diff --git a/test/cram/comments.t b/test/cram/comments.t
index 136fc36..c43bb32 100644
--- a/test/cram/comments.t
+++ b/test/cram/comments.t
@@ -4,17 +4,6 @@
   $ tsonnet ../../samples/comments/unterminated_block.jsonnet
   ../../samples/comments/unterminated_block.jsonnet:12:1 Unterminated block comment

-  1 "this is code" /*
-  2 This is a block comment
-  3 .
-  4 .
-  5 .
-  6 isn't
-  7 going
-  8 to
-  9 end
-  10 ?
-  11 ?
-  12 ?
-     ^
+  12: ?
+      ^
   [1]
diff --git a/test/cram/errors.t b/test/cram/errors.t
index 679c504..14e45c2 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -1,6 +1,24 @@
   $ tsonnet ../../samples/errors/malformed_string.jsonnet
   ../../samples/errors/malformed_string.jsonnet:1:22 String is not terminated

-  1 "oops... no end quote
-                        ^
+  1: "oops... no end quote
+     ^^^^^^^^^^^^^^^^^^^^^
+  [1]
+
+  $ tsonnet ../../samples/errors/sum_int_to_boolean.jsonnet
+  ../../samples/errors/sum_int_to_boolean.jsonnet:1:0 invalid binary operation
+  
+  1: 42 + true
+     ^^^^^^^^^
+  [1]
+
+  $ tsonnet ../../samples/errors/sum_int_to_boolean_multiline.jsonnet
+  ../../samples/errors/sum_int_to_boolean_multiline.jsonnet:1:0 invalid binary operation
+  
+  1: (1+1+1+1)+
+     ^^^^^^^^^^
+  2: 2+2+2+2+
+     ^^^^^^^^
+  3: 3+3+3+false+
+     ^^^^^^^^^^^^
   [1]

Look at this beauty! Errors are now much easier to spot.

Conclusion

We've successfully extended our error-handling capabilities to pinpoint parsing errors with clear visual highlighting. This makes debugging and development in Tsonnet significantly more user-friendly, as developers can quickly identify and fix issues in their code.

We've built a robust foundation that can be expanded for future features like type checking. The changes required extensive refactoring, but the results were worth it.

See you in the next issue!

Thanks for reading Bit Maybe Wise! Subscribe to get fresh Tsonnet updates delivered straight to your inbox — we'll help you spot those sneaky bugs faster than you'd find a red-and-white striped programmer in a crowded codebase!

Hercules Lemke Merscher @bitmaybewise