### 4` `Structures

\(\newcommand{\la}{\lambda} \newcommand{\Ga}{\Gamma} \newcommand{\ra}{\rightarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\tl}{\triangleleft} \newcommand{\ir}[3]{\displaystyle\frac{#2}{#3}~{\textstyle #1}}\)

Racket provides a general mechanism called structures to create new datatypes. In the teaching languages, structures primarily permit the "bundling" of several values into one. In many situations, data is naturally grouped, and most programming languages provide some mechanism to do this. (Structures have additional features in full Racket.) Although this may seem like a small addition, it considerably increases the expressivity and flexibility of our programs.

#### 4.1` `Syntax and semantics

As a modest initial example, consider the representation of points on the two-dimensional plane. Mathematically, these are written with two numerical coordinates, separated by a comma and surrounded by parentheses, as in the example \((3,4)\). In Racket, we can create a new type of structure to hold two coordinates with the following expression.

(define-struct point (x y))

This does not create any structures directly, but sets up the ability to create point structures with two fields named x and y. In effect, it creates the support functions for a new data type. Those functions can then be used to create and work with instances of the new type.

The define-struct expression above provides functions to create a point structure holding two values, to test whether something is a point structure, and to access the values within a point structure.

We create a point with the constructor function make-point (defined for us by the define-struct expression above) applied to two numeric values.

(define p (make-point 3 4))

We access a value bundled in a point with the accessor functions, again defined for us by define-struct. There is one function for each field. The name of the function is derived from the name of the structure and the name of the field, with a hyphen in between.

(point-x p) \(\Rightarrow\) 3 (point-y p) \(\Rightarrow\) "4"

The type predicate point? (defined for us) tests for point-ness.

(point? p) \(\Rightarrow\) #true (point? 17) \(\Rightarrow\) #false ]

An expression such as (make-point 8 1) is considered a value. This expression will not be rewritten by the Stepper or our semantic rules. The expression (make-point (+ 4 4) (- 3 2)) would be rewritten to (eventually) yield (make-point 8 1).

Here are the rewrite rules needed, specialized to point structures. We’ll discuss the general rules shortly, but you can probably figure them out.

(point-x (make-point vx vy)) \(\Rightarrow\) vx

(point-y (make-point vx vy)) \(\Rightarrow\) vy

(point? (make-point vx vy)) \(\Rightarrow\) #true

(point? v) \(\Rightarrow\) #false for other v

Functions can consume and produce structures.

; distance: Point Point -> Number |

; computes the Euclidean distance |

; between p1 and p2 |

; Example: |

(check-expect (distance (make-point 1 1) |

(make-point 4 5)) |

5) |

(define (distance p1 p2) |

(sqrt (+ (sqr (- (point-x p1) (point-x p2))) |

(sqr (- (point-y p1) (point-y p2)))))) |

; scale: Point Number -> Point |

; scales the point by the given factor |

; Example: |

(check-expect (scale (make-point 3 4) 0.5) |

(make-point 1.5 2)) |

(define (scale p factor) |

(make-point (* factor (point-x p)) |

(* factor (point-y p)))) |

Here is a trace of a use of scale.

(scale (make-point 3 4) 0.5)

\(\Rightarrow\) (make-point (* 0.5 (point-x (make-point 3 4))) (* 0.5 (point-y (make-point 3 4))))

\(\Rightarrow\) (make-point (* 0.5 3) (* 0.5 (point-y (make-point 3 4))))

\(\Rightarrow\) (make-point 1.5 (* 0.5 (point-y (make-point 3 4))))

\(\Rightarrow\) (make-point 1.5 (* 0.5 4))

\(\Rightarrow\) (make-point 1.5 2)

Although we have been using numbers as coordinates, make-point can accept any two Racket values. This is true of structures in general; they do not impose any restrictions on their arguments.

(distance (make-point 'Spider 'Man) |

(make-point 'Peter 'Parker)) |

This causes a run-time error only when the subtraction function in the body of distance tries to subtract 'Peter from 'Spider.

We address this with a data definition, a comment in the program that makes clear what we consider proper.

; A Point is a (make-point n m), where n and m are Numbers.

For the very small programs we write for ourselves initially, this might suffice, but for larger programs or for code that will be used by others, we might want more safeguards against accidental or deliberate misuse. For example, we could write a "guarded" version of the make-point constructor which raises an error on misuse.

(define (g-point m n) |

(cond |

[(and (number? m) (number? n)) (make-point m n)] |

[else (error 'g-point "Points can contain only numbers")])) |

We’ll use the data definition to introduce another element of the design recipe. A template is a code skeleton based on a data definition, to aid in writing the body of functions that consume that type of data. From the data definition for points, ,we can see that if p is a (make-point n m) then n is (point-x p), and m is (point-y p). We put these into a template for functions that consume point structures.

; my-point-fn : Point -> ? |

(define (my-point-fn p) |

... (point-x p) ... |

... (point-y p) ...) |

The template is available to us as a starting point when writing specific functions. While this may seem unnecessary for points, the idea will be useful when our use of structures becomes more complex.

Let’s nail down the general semantics of structures. Consider a structure s with \(n\) fields.

(define-struct s (f1 ... fn))

This introduces a constructor function make-s, accessor functions s-f1 through s-fn, and a type predicate s?, with the following rewrite rules:

(make-s v1 ... vn) is a value.

(s-fi (make-s v1 ... vi ... vn)) \(\Rightarrow\) vi for all i.

(s? (make-s v1 ... vn)) \(\Rightarrow\) true

(s? v) \(\Rightarrow\) false for other v.

Exercise 8: Using the clock struct defined by:

(define-struct clock (hours mins secs))

to represent 24-hour time (midnight is 00:00:00, one second before is 23:59:59), write the function tick that consumes a clock struct c and produces the clock struct representing the time one second after the time represented by c.

As an example, (tick (make-clock 23 59 59)) (representing the time 23:59:59) should evaluate to (make-clock 0 0 0) (representing the time 00:00:00).

Strive to make your solution brief and readable. There are many different solutions to this question, so as a challenge, try to think of two different ways to solve it. \(\blacksquare\)

Exercise 9: The set of numbers \(a+c\sqrt{b}\), where \(b\) is a natural number greater than 1 without repeated prime factors, and \(a\) and \(c\) are rational numbers, forms a quadratic field. Such numbers can be represented exactly in Racket (without using inexact numbers or the sqrt function) as (make-qfe a c b) using the following definition:

(define-struct qfe (rat irrat base))

Write the function add which consumes two qfe structs and produces the qfe struct representing their sum. Also write sub, mult, and div, each of which consume two qfe structs and produce one qfe struct, and which implement subtraction, multiplication, and division, respectively.

As an example, the number \(1-\sqrt{5}\) is represented by (make-qfe 1 -1 5), and the result of multiplying this number by itself, \(6-2\sqrt{5}\), is represented by (make-qfe 6 -2 5). Thus (mult (make-qfe 1 -1 5) (make-qfe 1 -1 5)) should produce (make-qfe 6 -2 5).

These operations require the bases of the two operands to be the same. Your code should signal an error using the predefined error function, which you can read about in the documentation for Beginning Student (in the "Misc" section). Similarly, your code should test for and signal an error in the case of attempted division by zero. You do not need to check that the base has no repeated prime factors (but this is an interesting task to think about). \(\blacksquare\)

#### 4.2` `Simulating natural numbers

Natural numbers are provided by Racket as part of the built-in number system. But if they were not provided, we could simulate them using structures. We will spend some time exploring this idea. This exercise gives us insight into the foundations of mathematics and into the close relationship between mathematics and computer science.

We will choose to use the following structure definitions:

(define-struct Z ()) |

(define-struct S (pred)) |

The Z structure has no fields. We can make one with the expression (make-Z). The S structure has one field. We will use that field to hold either a Z structure or another S structure. It may not have occurred to you to put a structure inside another of the same type, but this idea is key to being able to handle unbounded data.

You may be used to the natural numbers starting at 1. But in computer science, and in formal logic and the foundations of mathematics, the natural numbers start at 0. If we don’t make this choice, we have to add special cases for 0, and everything gets messier.

We will choose the following interpretations. The Racket expression (make-Z) represents the natural number \(0\). What an S structure represents is a bit more complicated. "S" in this case stands for "successor". The successor of the natural number \(0\) is the natural number \(1\).

The Racket expression (make-S (make-Z)) is our representation of the successor of \(0\), namely \(1\). Similarly, (make-S (make-S (make-Z))) represents \(2\), because \(2\) is the successor of \(1\).

In general, if e is the representation of the natural number \(n\), then (make-S e) is the representation of the successor of \(n\), namely the natural number \(n+1\).

We will call our representation of natural numbers Nat.

Data definition: a Nat is either (make-Z) or it is (make-S p), where p is a Nat.

This data definition is different from the one we had for Point. It is self-referential: it contains a reference to the thing being defined. In computer science, we call this a recursive definition. It is not a circular definition, just as our grammar rule for expressions was not circular. Part of the definition, referring to (make-Z), is not self-referential. The self-referential part refers to how to create a new Nat value from an already-justified Nat value.

The predecessor of a natural number \(n\) is \(n-1\). Zero has no predecessor. How might we write a Racket function to compute the predecessor of a Nat?

Since a Nat is either a Z or an S structure, this suggests the following skeleton:

(define (pred nat) |

(cond |

[(Z? nat) ...] |

[(S? nat) ...])) |

Filling in the ellipses is straightforward.

(define (pred nat) |

(cond |

[(Z? nat) (error "can't apply pred to Z")] |

[(S? nat) (S-pred nat)])) |

(check-error (pred (make-Z)) "can't apply pred to Z") |

(check-expect (pred (make-S (make-Z))) |

(make-Z)) |

(check-expect (pred (make-S (make-S (make-Z)))) |

(make-S (make-Z))) |

You can now see why pred is the name of the field in a S structure. It holds the representation of the predecessor of the number that the whole S structure represents (provided that it satisfies the definition for Nat).

The previous example suggests the following template for functions that consume a Nat. It combines the idea of the Point template (list the accessor functions) with a cond for the two cases in the Nat definition.

(define (my-nat-fn nat) |

(cond |

[(Z? nat) ...] |

[(S? nat) ... (S-pred nat) ...])) |

This will work for some functions, but the next example shows that in some cases, it would help to be a little more precise. Let’s try to work out addition for Nats.

; plus : Nat Nat -> Nat

The template only mentions one parameter. We can add a second one.

(define (plus nat1 nat2) |

(cond |

[(Z? nat1) ... nat2 ...] |

[(S? nat1) ... (S-pred nat1) ... nat2 ...])) |

Filling in the first case is easy.

(define (plus nat1 nat2) |

(cond |

[(Z? nat1) nat2] |

[(S? nat1) ... (S-pred nat1) ... nat2 ...])) |

The second case seems harder.

(define (plus nat1 nat2) |

(cond |

[(Z? nat1) nat2] |

[(S? nat1) ... (S-pred nat1) ... nat2 ...])) |

In the case where nat1 is an S structure, what can we do with its predecessor and nat2?

We can add them together and compute the successor of the result.

(define (plus nat1 nat2) |

(cond |

[(Z? nat1) nat2] |

[(S? nat1) (make-S (plus (S-pred nat1) nat2))])) |

This is a recursive function: it uses itself in its own body. Like the recursive definition of Nat, it is not circular. Traces will give us confidence that it works. Here is the trace of adding the Nat representations of zero and zero, which should result in the Nat representation of zero.

\(\Rightarrow\)

(cond |

[(Z? (make-Z)) (make-Z)] |

[(S? (make-Z)) |

(make-S (plus (S-pred (make-Z)) (make-Z)))]) |

Here is the trace of adding the Nat representations of one and zero, which should result in the Nat representation of one.

\(\Rightarrow\)

(cond |

[(Z? (make-S (make-Z))) (make-Z)] |

[(S? (make-S (make-Z))) |

(make-S (plus (S-pred (make-S (make-Z))) (make-Z)))]) |

(cond |

[false (make-Z)] |

[(S? (make-S (make-Z))) |

(make-S (plus (S-pred (make-S (make-Z))) (make-Z)))]) |

(cond |

[(S? (make-S (make-Z))) |

(make-S (plus (S-pred (make-S (make-Z))) (make-Z)))]) |

\(\Rightarrow\) (make-S (plus (make-Z) (make-Z)))

At this point, the trace is not done, but we have traced the evaluation of (plus (make-Z) (make-Z)) already, and we know it results in (make-Z), so this trace results in (make-S (make-Z)).

It is clear that traces will be very long and boring if we list every step. The symbol \(\Rightarrow^*\), meaning "yields in zero or more steps", will be helpful. For example, we can use it to skip to the answer in a cond.

(plus (make-S (make-Z)) (make-Z)) \(\Rightarrow^*\) (make-S (plus (make-Z) (make-Z))) \(\Rightarrow^*\) (make-S (make-Z))

We can now do a condensed trace of a longer computation, adding the Nat representations of two and two.

(plus (make-S (make-S (make-Z))) (make-S (make-S (make-Z))))

\(\Rightarrow^*\)
(make-S (plus (make-S (make-Z))) (make-S (make-S (make-Z))))

\(\Rightarrow^*\)
(make-S (make-S (plus (make-Z) (make-S (make-S (make-Z))))))

\(\Rightarrow^*\)
(make-S (make-S (make-S (make-S (make-Z)))))

The full trace demonstrates that two plus two equals four, at least for Nats.

Recall the data definition for Nat: a Nat is either (make-Z) or it is (make-S p), where p is a Nat. Now that we have the idea of recursion in a function definition, we can further refine our template:

(define (my-nat-fn nat) |

(cond |

[(Z? nat) ...] |

[(S? nat) ... (my-nat-fn (S-pred nat)) ...])) |

The recursive application of the function happens in the case corresponding to the self-referential part of the data definition.

This is known as pure structural recursion. A function consuming Nats uses pure structural recursion if it conforms to the template we just developed.

We can add more Nat parameters (as in the case of plus, where we added nat2) but they must either be unchanged in the recursive application of the function within its body, or we can apply S-pred to them, as with the nat parameter in the template. No other variations are permitted.

Pure structural recursion is to be preferred because it is easier to reason about, as we will see. However, not every function consuming Nats uses pure structural recursion. This function computes the result of dividing a natural number by two and rounding down.

(define (idiv2 nat) |

(cond |

[(Z? nat) (make-Z)] |

[(equal? nat (make-S (make-Z))) (make-Z)] |

[else (make-S (idiv2 (S-pred (S-pred nat))))])) |

Why is this not pure structural recursion? First, there is an extra case for the representation of \(1\). Second, the recursive application is not on (S-pred nat), but on (S-pred (S-pred nat)). The term for this kind of recursion is generative recursion, because we are "generating" a non-structural argument for the recursive application. It is a lot harder to get right, so we avoid it when we can.

If we apply idiv2 to the representation of \(5\), it should produce the representation of \(2\), and a condensed trace confirms this.

(idiv2 (make-S (make-S (make-S (make-S (make-S (make-Z)))))))

\(\Rightarrow^*\) (make-S (idiv2 (make-S (make-S (make-S (make-Z))))))

\(\Rightarrow^*\) (make-S (make-S (idiv2 (make-S (make-Z)))))

\(\Rightarrow^*\) (make-S (make-S (make-Z)))

Make sure you can fill in the missing steps. This is a good idea in general when dealing with condensed traces.

#### 4.3` `Correctness

If you were to try to write down a precise definition of natural number and a precise definition of addition, based on what you knew before you started reading this, they would probably be longer and more complicated than the ones we just saw.

But how can we be sure that plus correctly implements addition? I could turn the question around and ask how can you be sure that your previous idea of addition is actually correct? I’m not going to engage with that philosophical question, but I want to spend a little time talking about correctness.

One thing we can do to increase our confidence in our definitions is to show that plus has the properties we expect of addition. Let’s start with some very simple properties.

Theorem: \(0+2 = 2\).

This is almost too simple to be considered a mathematical theorem, but the corresponding theorem in our Racket representation is not so obviously true.

Theorem: (plus (make-Z) (make-S (make-S (make-Z))))

\(\Rightarrow^*\) (make-S (make-S (make-Z)))

The proof of this second theorem is a trace.

(cond |

[(Z? (make-Z)) (make-S (make-S (make-Z)))] |

[(S? (make-Z)) |

(make-S (plus (S-pred (make-Z)) (make-S (make-S (make-Z)))))]) |

(cond |

[true (make-S (make-S (make-Z)))] |

[(S? (make-Z)) |

(make-S (plus (S-pred (make-Z)) (make-S (make-S (make-Z)))))]) |

Here is a theorem that is a little more interesting.

Theorem: For all Nats x, (plus (make-Z) x) \(\Rightarrow^*\) x.

Here is a proof of this theorem.

\(\Rightarrow\)

(cond |

[(Z? (make-Z)) x] |

[(S? (make-Z)) (make-S (plus (S-pred (make-Z)) x))]) |

While this conforms to our semantic model, it does not represent a proper Racket computation, because x is not defined. It is not so much a trace as a trace schema. Substituting any specific Nat for x will give a valid trace.

In general, to prove a statement that starts with "For all Nats x", one method is to remove the for-all and prove the rest of the statement treating x as an unknown, as we do in algebra. Our reasoning must remain valid when any specific Nat is substituted for x in the proof.

This is known as for-all introduction in formal logic. It is quite common. For example, in mathematics, when we prove \((x+y)(x-y)=x^2-y^2\) by using the distributive law, we are really proving "For all \(x,y\), \((x+y)(x-y)=x^2-y^2\)" using for-all introduction.

If our logical system permits this as valid reasoning, it is useful. But this method has its limitations. Here is a theorem for which for-all introduction will not suffice.

Theorem: For all natural numbers \(n\), \(n+0=n\).

Once again, we take this for granted in mathematics (in formal logic, it does require a proof) but the corresponding theorem for plus is not obvious.

Theorem: For all Nats x, (plus x (make-Z)) \(\Rightarrow^*\) x.

Trying to construct a trace schema fails, because we don’t know enough about x to be able to make progress in our semantic model of Racket computation. Since plus does a case analysis using a cond, we can try a case analysis in our proof. Since x is a Nat, it is either (make-Z) or (make-S y) for some Nat y.

The case where x is (make-Z) can be proven with a trace.

(cond |

[(Z? (make-Z)) (make-Z)] |

[(S? (make-Z)) (make-S (plus (S-pred (make-Z)) (make-Z)))]) |

But the reasoning is not so clear in the case where x is (make-S y).

(cond |

[(Z? (make-S y)) (make-Z)] |

[(S? (make-S y)) (make-S (plus (S-pred (make-S y)) (make-Z)))]) |

(cond |

[(S? (make-S y)) (make-S (plus (S-pred (make-S y)) (make-Z)))]) |

\(\Rightarrow\) (make-S (plus y (make-Z)))

We have the same problem with y that we did with x originally. But we have managed to show that (plus (make-S y) (make-Z)) \(\Rightarrow^*\) (make-S (plus y (make-Z))), using a trace schema which gives a valid trace when we substitute any Nat y.

Suppose we had a trace showing (plus y (make-Z)) \(\Rightarrow^*\) y for a
particular Nat y.
We can take this trace and wrap each expression in (make-S ...)
to get a trace schema showing (make-S (plus y (make-Z))) \(\Rightarrow^*\) (make-S y).
Putting this trace together with the one we get from the trace schema above,
we get a trace of

(plus (make-S y) (make-Z)) \(\Rightarrow^*\) (make-S y).

To summarize, if we can show (plus y (make-Z)) \(\Rightarrow^*\) y,
then we can show

(plus (make-S y) (make-Z)) \(\Rightarrow^*\) (make-S y).

We want to show that for all Nats x, (plus x (make-Z)) \(\Rightarrow^*\) x. We can show this if x is (make-Z), by a trace. We can then use the technique above to show this if x is (make-S (make-Z)), and then (make-S (make-S (make-Z))), and so on. Since we can construct a trace for any given Nat, the result must be true for all Nats.

Again, our logical system has to consider this kind of reasoning valid. The term for this is structural induction on Nats (we can use the idea for other data types as well). Generalizing to other properties of Nats we may wish to prove, suppose we have a statement of the form "For all Nats x, P[x] holds", where P is a description of a property of x. For example, P[x] could be (plus x (make-Z)) \(\Rightarrow^*\) x.

If we can prove the following two things, we can conclude that the for-all statement holds.

P[(make-Z)] holds, and

For all Nats y, if P[y] holds, then P[(make-S y)] holds.

Here P[(make-Z)] means P with (make-Z) substituted for x.

With the technique of structural induction on Nats, we can prove many more things, such as the commutative and associative laws.

Commutative law:
For all Nats x, y, z,
(plus x y) \(\Rightarrow^*\) z

if and only if (plus y x) \(\Rightarrow^*\) z.

Associative law:
For all Nats x, y, z, w,
(plus (plus x y) z) \(\Rightarrow^*\) w

if and only if
(plus x (plus y z)) \(\Rightarrow^*\) w.

We take these properties for granted with natural numbers, but at some level, they do require justification.

We can translate what we have learned with our simulation of natural numbers back to dealing with mathematical natural numbers and Racket’s built-in numbers.

Data definition:

A natural number is either \(0\) or \(m+1\), where
\(m\) is a natural number.

Here is a template for a structurally-recursive function consuming a natural number.

(define (my-nat-fn n) |

(cond |

[(zero? n) ...] |

[(positive? n) ... (my-nat-fn (sub1 n)) ...])) |

Structural induction on natural numbers is also known as mathematical induction.

To prove "For all natural numbers \(n\), P[\(n\)] holds":

Prove P[\(0\)] holds, and

Prove for all natural numbers \(m\), if P[\(m\)] holds, then P[\(m+1\)] holds.

Translating our plus function back to built-in numbers gives us a function that adds without using the predefined +.

(define (natural-plus n m) |

(cond |

[(zero? n) m] |

[(positive? n) (add1 (natural-plus (sub1 n) m))])) |

This is not particularly useful, as it is less efficient than +. But it does serve as an example of the use of the template, which will come in handy for other computations. The better efficiency of built-in Racket numbers compared to Nats invites the question: what representation is DrRacket using? We will return to this question in a later chapter.

The idea of proving things about computation is deep and fascinating, but we will not dwell too long on it here. It is examined in greater detail in my flânerie Logic and Computation Intertwined (LACI), which is a natural successor to this one.

Exercise 10: Write the function in-triangle that consumes four point structs p1, p2, p3, and p, representing points in the Euclidean plane with rational coordinates, and produces a Boolean value which is true if and only if the point p is on or within the (possibly degenerate) triangle defined by p1, p2, and p3.

There are many possible ways to answer this question mathematically, and it would not be cheating to use the Internet to research them (though you may enjoy working them out yourself). Keep in mind that Racket can represent rational numbers with arbitrarily large numerators and denominators, and your function must be correct for all rational arguments. Consequently, you should avoid Racket computations that produce inexact values (e.g. square roots or trigonometric functions) as they may result in errors.

You should make effective use of helper functions, and write a lot of tests. Fundamentally, this is an exercise about boundary conditions. \(\blacksquare\)

Exercise 11: Write the Racket functions times and compare.

times consumes two Nats nat1 and nat2, representing natural numbers \(n\) and \(m\), and produces the Nat representing \(nm\).

compare consumes two Nats nat1 and nat2, representing natural numbers \(n\) and \(m\), and produces 'less if \(n<m\), 'equal if \(n=m\), and 'greater if \(n>m\).

Use structural recursion on Nats to write your functions. Try to use structural recursion on one parameter, but if that does not work, use structural recursion on two parameters. Do not use Racket’s built-in numbers, or any functions that consume or produce them. \(\blacksquare\)

Exercise 12: Write the Racket function perfect, which consumes a positive integer n and produces true if and only if n is perfect.

A positive integer is perfect if it is equal to the sum of its divisors (including 1 but not including itself). You may find the predefined function remainder useful.

Any recursive function you write (perfect or any helper functions) must use structural recursion on natural numbers. \(\blacksquare\)