3 Syntax and semantics

8.1

3 Syntax and semantics

\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)

A program is a sequence of characters that describes a computational task to be carried out by a computer. Instructions for a human being can be general or vague, and the human has some latitude in how to interpret the instructions. But a computer has no capacity for interpretation. It does precisely as it is told. So, by necessity, a program has a precise meaning and effect.

A model of a programming language provides a way of describing the meaning of a program. Typically this is done informally, by examples. With Racket, we can do better. Racket has few language constructs, so the model description is short. Mostly what we use to create the model is the language itself. In describing the model, I won’t use diagrams and I won’t give descriptions (precise or vague) of any underlying machine.

The value of such a model is that once you understand it and have it in mind when writing programs, it is more likely that your programs will be correct enough to be run, and that they will produce correct results. It also helps you make sense of error messages and quickly correct mistakes. Finally, precise definitions let us reason carefully about programs; that is, we can prove things about their behaviour.

3.1 Syntax and grammar

In the previous chapter, I used the word syntax, which refers to the way we’re allowed to say things. To this I want to add a different concept, semantics, which refers to the meaning of what we can say. For an example of how these two concepts are different, consider the English sentence "Trombones debone clouds." This sentence is grammatically correct; there is nothing wrong with its syntax. But it has no sensible semantic interpretation.

Syntax in English is specified by grammatical rules. English grammatical rules are not strictly enforced. It is possible to violate them and still make oneself understood. As previously mentioned, this is because human beings have the capacity for interpretation. The rules for computer languages must be strictly enforced. In Racket, a program that violates the syntax rules cannot be run. One could imagine a looser policy where the program runs until it has to use the offending section, but the stricter policy saves time.

An English sentence can be made up of a subject, verb, and object, in that order, or perhaps just a subject and verb. There are other choices, but let’s start with these. We might express these two rules with the following notation:

sentence		=		subject verb object
		\|		subject verb

The linguist Noam Chomsky formalized grammars in this fashion in the 1950’s, as part of work to support his thesis that grammar was innate, because of similarities among human languages. The idea proved useful for programming languages as well.

The documentation for Beginning Student starts with the full grammar for the language. It describes definitions like this:

definition	=	(define (name variable variable ...) expr)
	\|	(define name expr)
	\|	(define-struct name (name ...))

The first line is for the definition of a function with one or more parameters. Why the notation specifies that is something we will discuss shortly. The vertical line separates rules, and by convention we don’t repeat the equal sign (which doesn’t signify equality, but something like "is defined as", just as for mathematical definitions) and what is to its left. The second line is for the definition of a constant, and the third line is for a feature that we’ll discuss in the next chapter.

The grammar rules above use expr to denote expressions, which have their own set of rules in the full grammar. It also uses name to denote the names that can be used for constants and user-defined functions, and variable to denote the names that can be used for parameters. We sometimes combine these last two concepts using the word identifiers. The mechanism of grammars can be used to describe identifiers, but this is not particularly useful. Returning to the analogy of the English language, an identifier is like a word, and for words there are spelling rules, rather than grammar rules. The Beginning Student documentation covers these rules informally, and we will do the same. If we were writing a program like DrRacket, it would be a good idea to be more precise, but as users of DrRacket writing small Racket programs, we can cope with a more relaxed definition.

Briefly, identifiers are made up of letters, numbers, hyphens, underscores, and a few other punctuation marks. They must contain at least one non-number (otherwise they would be a number). They can’t contain spaces or any of these: ( ) , ; { } [ ] ‘ ’ . There are also spelling rules for numbers (integers, rationals, decimals) which are fairly intuitive. Symbols start with a single quote ’ followed by something obeying the rules for identifiers.

Of more interest to us are the grammatical rules describing program structure. Let’s start with the ones that say that a program is a sequence of definitions and expressions.

program	=	def-or-expr ...

def-or-expr	=	definition
	\|	expression

This is the second time we have seen three dots in a row. This is known as an ellipsis. In English, it is often used for omission, to indicate something left unsaid, or some words left out of a direct quote. In mathematics, an ellipsis is often used to indicate a pattern, as in the statement "The positive integers less than \(n\) are \(1\), \(2\), \(\ldots\), \(n-1\)."

In our grammar, an ellipsis is used as a pattern meaning "zero or more repetitions". We will introduce some additional uses of the ellipsis later, both to indicate a pattern and to indicate an omission.

One thing that is not addressed in the documentation for Beginning Student is spacing. I said that there has to be a space between arguments to a function. But there could be two spaces, or five, or you could hit Enter/Return several times to move the next argument vertically downward. These are all grouped under the term whitespace. As discussed briefly in the previous chapter, we can use whitespace to our advantage to improve readability. This is mostly a matter of convention, and there’s no need to attempt a more precise definition.

Since the full grammar is given in the Racket documentation, I won’t reproduce all of it here. But I do want to point out one important feature. Consider the following rule for expressions, which says "An expression can be a function applied to one or more argument expressions".

expr

(name expr expr ...)

Here expr appears on both the left and right hand sides. This is not a circular definition, though, as the ones on the right are subexpressions. In effect, this lets us build up arbitrarily large expressions by starting with smaller ones and putting them together. This recursion was an important part of Chomsky’s explanation for the phenomenon of arbitrarily long but still meaningful sentences in human languages. It lets us keep our grammars small and useful, even as the programs (or sentences) they describe can be very large. It will be useful to us in other ways as well.

3.2 A semantic model

As mentioned above, the English sentence "Trombones cook clouds" is syntactically correct but has no semantic interpretation. The English sentence "Time flies like an arrow" is ambiguous; it has more than one semantic interpretation. The most obvious use is metaphorical. An arrow shot from a bow moves swiftly, and time sometimes seems to be passing quickly. But there is a more literal interpretation, assuming some sort of insect known as a "time fly" that is particularly attracted to arrows. We must make sure that our Racket programs are unambiguous. They should have exactly one interpretation.

A semantic model of a programming language provides a method of predicting the result of running any program. This is an important part of being able to diagnose the problem when a program does not behave as expected. Our model will involve simplification of the program via substitution, as we have been doing all along. As mentioned briefly in the previous chapter, a substitution step finds the leftmost outermost subexpression eligible for rewriting, and rewrites it by the rules we are about to describe. Every substitution step yields a valid Racket program, until no further substitutions are possible, and all that remains is a sequence of definitions and values.

We start with the application of predefined Racket functions, such as the arithmetic functions. These we treat as we would in mathematics, trusting the Racket implementers to have done the same.

(+ 3 5) \(\Rightarrow\) 8

(expt 2 10) \(\Rightarrow\) 1024

Formally, the substitution rule is:

(f v1 ... vn) \(\Rightarrow\) v

where f is a predefined function and v the value of \(f(v_1,\ldots,v_n)\). Note the two uses of a pattern ellipsis.

This is fine for arithmetic functions, but there are many predefined functions working on various types of Racket data whose effects are not so clear. Many of these can be defined in terms of other functions, and we will do that from time to time for illustrative purposes. We’ll also later see how a model like this can be extended to be more precise about arithmetic as well.

There is a separate rule for the application of user-defined functions. Consider the following definition.

(define (term x y) (* x (sqr y)))

The function application (term 2 3) can be evaluated by taking the body of the function definition and replacing x by 2 and y by 3. The result is (* 2 (sqr 3)). The rule does not apply if an argument is not a value, as in the case of the second argument in (term 2 (+ 1 2)). Any argument which is not a value must first be simplified to a value using the rules for expressions.

The general substitution rule is:

(f v1 ... vn) \(\Rightarrow\) newexp

where (define (f x1 ... xn) exp) occurs previously, and newexp is obtained by substituting into the expression exp, with all occurrences of the formal parameter xi replaced by the value vi (for i from 1 to n). Note we are using a pattern ellipsis in the rules for both predefined and user-defined functions to indicate several arguments.

Here is a longer example of rewriting using the model.

(define (term x y) (* x (sqr y)))
(term (- 3 1) (+ 1 2))
\(\Rightarrow\) (term 2 (+ 1 2))
\(\Rightarrow\) (term 2 3)
\(\Rightarrow\) (* 2 (sqr 3))
\(\Rightarrow\) (* 2 9)
\(\Rightarrow\) 18

A constant definition binds a name (the constant) to a value (the value of the expression). We add the following substitution rule to deal with the use of constants.

id \(\Rightarrow\) val, where (define id val) occurs previously.

Here is a short example.

(define x 3) (define y (+ x 1)) y
\(\Rightarrow\) (define x 3) (define y (+ 3 1)) y
\(\Rightarrow\) (define x 3) (define y 4) y
\(\Rightarrow\) (define x 3) (define y 4) 4

For conditional expressions, there are three rules: when the first question expression is #false, when it is #true, and when it is else.

(cond [false exp] ...) \(\Rightarrow\) (cond ...)

(cond [true exp] ...) \(\Rightarrow\) exp

(cond [else exp]) \(\Rightarrow\) exp

These suffice to simplify any cond expression. Here we are using omission ellipses to avoid specifying the remaining clauses in the cond.

Here’s an example of the use of these new rules.

(define n 5)
(cond [(even? n) x] [(odd? n) y])
\(\Rightarrow\) (cond [(even? 5) x] [(odd? n) y])
\(\Rightarrow\) (cond [false x] [(odd? n) y])
\(\Rightarrow\) (cond [(odd? n) y])
\(\Rightarrow\) (cond [(odd? 5) y])
\(\Rightarrow\) (cond [true y])
\(\Rightarrow\) y

Running this program will result in an error message stating that y is undefined. A syntax error occurs when a sentence cannot be interpreted using the grammar. A run-time error occurs when an expression cannot be reduced to a value by application of our (still incomplete) evaluation rules.

(cond [(> 3 4) x])

Evaluating this results in a run-time error.

Exercise 6: Create rules for evaluating an expression using an if form. \(\blacksquare\)

There are times when the semantic model will be slightly inconsistent with DrRacket’s behaviour, or with the Stepper that shows step-by-step evaluation. This is usually due to a desire to keep the model simple and usable. For the longer example just above, DrRacket would detect the undefined y immediately and not do any evaluation. But the model correctly predicts an error, even if it is off with respect to the timing. Similarly, it would complicate the model too much to formalize the behaviour of the testing forms such as check-expect. The informal understanding that they are saved up and evaluated at the end suffices.

A case where our semantic model is inconsistent with the Stepper involves the simplification rules we use for Boolean expressions involving and and or. The end result is the same, but the intermediate steps are different. The implementers of the Stepper made choices which result in more complicated rules, but whose intermediate steps appear to help students in lab situations.

(and true exp2 ...) \(\Rightarrow\) (and exp2 ...)
(and false exp2 ...) \(\Rightarrow\) false
(or true exp2 ...) \(\Rightarrow\) true
(or false exp2 ...) \(\Rightarrow\) (or exp2 ...)
(and) \(\Rightarrow\) true
(or) \(\Rightarrow\) false

As in the rewriting rules for cond, we are using an omission ellipsis. The rules permit the use of and and or with one or no arguments, which is the case in full Racket. But in Beginning Student, they must have at least two arguments.

Exercise 7: Create an alternate set of rules that require at least two arguments for and and or. You don’t have to match the behaviour of the Stepper. \(\blacksquare\)

We can and will add to the semantic model when we introduce a new feature of Racket. Understanding the semantic model is very important in understanding the meaning of a Racket program. Doing a step-by-step reduction according to these rules is called tracing a program. It is an important skill in any programming language or computational system.

Unfortunately, once we start dealing with unbounded data, traces get very long, and the intermediate steps are hard to make sense of in the Stepper. In future traces, we will often do a condensed trace by skipping several steps in order to focus on the important transformations. It is very important to know when these gaps occur and to be able to fill in the missing transformations.

In the next chapter, we’ll see how we can create our own datatypes in Racket.

← prev up next →

1	Introduction
2	Starting out with Racket
3	Syntax and semantics
4	Structures
5	Lists
6	Functional Abstraction
7	Efficient Representations
8	Trees
9	Generative Recursion
10	Interpreters