Denotational Semantics

There are two main aspects to a programming language - its syntax and its semantics. The syntax defines the correct form for legal programs and the semantics determines what they compute (if anything).

Programming Language = Syntax + Semantics

The treatment of syntax in programming languages has been very successful. Backus Naur Form (or Backus Normal Form) BNF was first used to describe the grammar of Algol-60. BNF is a meta language for syntax. Once a good notation exists it is natural to use it mathematically to describe classes of grammar (e.g., regular, context free, context sensitive), to investigate parsing algorithms for various classes and so on. Most programming languages from BNF's inception onwards have had "nice" syntax whereas most of those from before it have not, e.g., Fortran. Computer programs to manipulate BNF were written and could check various properties of a grammar (e.g., LL(1)) or translate it into a parser, giving us parser generators.

The other popular meta language for grammars is the syntax diagram which is equivalent to BNF and there are easy translations from one to the other.

Models for semantics have not caught-on to the same extent that BNF and its descendants have in syntax. This may be because semantics does seem to be just plain harder than syntax. The most successful system is denotational semantics which describes all the features found in imperative programming languages and has a sound mathematical basis. (There is still active research in type systems and parallel programming.) Many denotational definitions can be executed as interpreters or translated into "compilers" but this has not yet led to generators of efficient compilers which may be another reason that denotational semantics is less popular than BNF.

On Mathematical Models of Computation.

Finite State Automata and Turing Machines.

A finite state automata (FSA) consists of (i) a finite set of states S, one of which is the start state, (ii) a finite alphabet A, (iii) a transition function t:S×A→S.
In addition depending on how you want to define FSA's, either there is an accepting state and a rejecting state, or some of the states are labelled as accepting and some as rejecting.

The notion of FSA is very simple yet it clearly captures the essence of some things in computing. First of all a lot of hardware devices are FSAs, e.g., think of a lift controller. Secondly a lot of software devices are FSAs, e.g., think of lexical analysis in a compiler.

Once we have a definition of a mathematical concept, such as FSA, we can do mathematics on it. For example, the class of language that FSAs accept is exactly the regular languages. We can compose two or more FSAs and form their product etc.

A Turing machine is an FSA with an unbounded tape that it can read from and write to; so now the transition function t:S×A→S×A×{fwd,rev}. This captures much of the essence of a computer: The FSA is the control unit and we can only build FSAs with digital devices (gates etc.). The tape is the store but note that it is unbounded. A real computer has a finite, if large, store, i.e., still an FSA. However we can count backing tapes, removable discs and other off-line media and in principle we could manufacture more if the machine needed. In other words, the tape or store is well described for many purposes as being unbounded, although the maximum amount of tape that can be read or written within t steps is limited to t.

The Turing machine is very simple and is easily shown to be equivalent to an actual physical computer in the sense that they can compute exactly the same set of functions. A Turing machine is much easier to analyse and prove things about than any real computer. However, Turing machines are unpleasant to program, to say the least. Therefore they do not provide a good basis for studying programming and programming languages.

Lambda Calculus.

Lambda Calculus is a simple mathematical system:-

<exp> ::= <identifier>
        | λ<identifier>.<exp>
        | <exp> <exp>
        | ( <exp> )

-- Lambda Calculus Syntax --

It turns out that the Lambda Calculus' four lines of (abstract) syntax, plus conversion rules, are sufficient to define booleans, integers, conditional expressions (if), arbitrary data structures and computations on them. It is therefore a good candidate for being considered the prototype programming language and has inspired Lisp and the modern functional programming languages.

Lambda Calculus was designed to study function definition, function application, parameter passing and recursion. As originally presented, its semantics are defined by its conversion rules and these are purely syntactic - they could be implemented in a decent text editor or macro-processor. This is not considered abstract enough.

A potential problem with syntactic definitions is that they can be meaningless (no meaning) or ambiguous (many possible meanings). This is one reason why we would prefer programs to stand for or to denote some abstract objects which we could reason about.

Problems of Circularity.

Here are few well worn paradoxes:-

This statement is false.
1: Statement 2 is true. 2: Statement 1 is false.
The barber is a man who shaves all men who do not shave themselves. Who shaves the barber?
Let S = the set of all sets that are not members of themselves. S = {T | not(T in T)}. Is S a member of S or not?

Paradoxes are a bit of fun, but recall the proof of the halting problem. Suppose that we can write a program 'halts', then we can write another program 'paradox':-

  procedure paradox(q);

    procedure halts(p, d):boolean;
     {returns true if program p
      halts when run on data d,
      returns false otherwise}
    begin{halts}
     ...impossibly cunning code...
    end{halts};

  begin while halts(q, q) do skip
  end{paradox}

Question: does paradox(paradox) loop or halt?

A description of paradox is:- Paradox is a program which halts when applied to programs which loop when applied to themselves. Now carry out the following substitutions:-

Replace 'loop' by 'do not halt':- Paradox is a program which halts when applied to programs which do not halt when applied to themselves.
Replace 'halt(s) when applied to' by 'shave(s)':- Paradox is a program which shaves programs which do not shave themselves.
Replace 'program(s)' by 'man (men)' and 'which' by 'who':- Paradox is a man who shaves men who do not shave themselves.
Replace 'paradox' by 'the barber':- The barber is a man who shaves men who do not shave themselves.

Sound familiar?

Semantics Based on Abstract Sets.

We prefer to have semantics based on abstract objects rather than on purely syntactic rules which might be meaningless or ambiguous, or on concrete objects which might be implementation dependent or superseded. An almost too familiar example is the language of Numerals which stand for integers:-

V : Numerals→Int --valuation fn

<Numeral> ::= <Numeral><Digit>
            | <Digit>
<Digit> ::= 0|1|2|3|4|5|6|7|8|9

V[0] = 0
V[1] = 1
V[2] = 2
V[3] = 3
V[4] = 4
V[5] = 5
V[6] = 6
V[7] = 7
V[8] = 8
V[9] = 9

n :Numeral
d :Digit

V[nd] = V[n]*10 + V[d]

Semantics of Decimal Numerals

Note that the numerals '123', '0123' and '00123' all stand for the integer 123. '123' is a string and 123 is an integer. Have you ever seen an integer?

Potential Problems with Semantics based Naively on Set Theory.

We would like our programming languages to manipulate some set of values which includes at least integers and booleans say:-

   Value=Int+Bool+...

We can model arrays, records, lists, and trees etc as products and sums of other values:-

   Value = Int + Bool
         + (Value×Value) +...

We also want functions, Value→Value, to be "first class values":-

   Value = Int + Bool
         + (Value×Value)
         + (Value→Value)+ ...

However there is a well known result in set theory that if Value→Value is the set of all mappings from Value to Value then Value→Value is strictly larger than Value and so the equality above cannot hold.

The Solution.

The collection of computable functions, which is what Value→Value must be in the context of programming languages, must be much smaller than the set of all mapping from Value to Value. There are some well known mappings that are not computable, for example, halts :(Value→Value)×Value→Bool. This turns out to be our salvation.

Scott developed a theory of lambda calculus in which (computable) functions (→) turn out to be monotonic and continuous under a certain partial order. (This has little to do with the usual numerical notions of continuity and monotonicity.)

From a mathematical point of view a function is just a special kind of relation. A binary relation on sets A and B is a subset of A×B. A function, f, from A to B is a relation such that for each 'a' in A, there is at most one entry <a,b> in f, but for each b' there could be zero, one or many entries <a',b'>, <a",b'> etc., i.e. functions are in general many to one.

  {<0,1>, <1,1>, <2,2>,
   <3,6>, <4,24>, ... }

  -- The Factorial Function --

The problem is to relate an equation or program for factorial to this abstract object.

  1.let factorial n =
       if n=0 then 1
       else n*factorial(n-1)

  2.let F f n =
       if n=0 then 1
       else n*f(n-1)
    let factorial' = Y F

  -- Factorial Programs --

The first program is a self-referential equation which might have 0, 1, 2, several, or even infinitely many solutions. It turns out that this equation does have infinitely many solutions! Fortunately it has a special one, the least fixed-point, and this is the natural meaning of the equation.

  f₀ = λ n.undefined   --loops
     = { }  the undefined fn

  f₁ = F f₀
     = λ n.if n=0 then 1
                  else n*f₀(n-1)
     = {<0,1>}

  f₂ = F f₁
     = λ n.if n=0 then 1
	          else n*f₁(n-1)
     = {<0,1>, <1,1>}

  f₃ = F f₂
     ={<0,1>, <1,1>, <2,2>}

  f₄ = F f₃
     ={<0,1>, <1,1>, <2,2>,
       <3,6>}

  etc.

None of the functions f₀, f₁, ... is the factorial function, but note that they all approximate it, in the sense that they agree with it where they are defined. The factorial function is the limit of this sequence of functions. This is where the continuity mentioned above is needed; we do not want any "jumps" at the limit.

Incidentally, the infinitely many solutions for factorial come from the ability to specify an arbitrary result for f(-1), say. This fixes the results for other negative inputs. This is an arbitrary choice and should not be part of the factorial function which is the least defined function that agrees with all f_i and is undefined on negative inputs.

References.

L. Allison. A Practical Introduction to Denotational Semantics. Cambridge University Press, 1986 [programs].
D. S. Scott. Logic and programming languages. CACM, 20(9), pp.634-641, Sept 1977.
R. D. Tennent. The denotational semantics of programming languages. CACM, pp.437-453, Aug 1976.