Lossless Abstractions in Error Reporting

Programming Languages are compiled in two parts. First, the text is parsed into a syntax tree. Then the syntax tree is converted into a Semantic Model. The Semantic model might abandon some of the inconsequential information in the syntax tree. Whitespace, for instance, does not affect execution in JavaSript, and so a Javascript Semantic Model might not keep track of white space between tokens.

When the Semantic Model discovers an issue with the code, however, that white space may be helpful in informing the user where the error originated. Without knowing the exact row and column position where a semantic tree node originated in the code, where should the semantic model report the error. Most errors in programming tool kits are reported with red underlines on the offending text, but without tracking information about the text throughout parsing and execution, this position cannot be backtracked.

One trick to solve this problem is to just retain all information during the compilation. The semantic nodes will still have information about the white space before and after them, just in a less accessible location. This is called a [[Lossless Abstraction]]—the semantic model is still structured differently so as to simplify further processing, but it does so without losing any information. The semantic model could be converted back into the syntax tree with an inverse of the parsing function, which makes it possible to find the exact textual origin of errors.

#software #active