Introduction

Descriptions of problems in the real world have many superfluous details. An essential step in problem solving is to identify the underlying abstract problem devoid of all unnecessary detail. Similarly, a particular model of computer has many details that are irrelevant to the problem, for example the processor architecture and the word length. One of the arts of computer programming is to suppress unnecessary detail of the problem and of the computer used. An invaluable aid is the use of a high level programming language such as Algol, C, Haskell, Java, ML, Turing, etc..

Once problems are abstracted, it becomes apparent that seemingly different problems are essentially similar or even equivalent in a deep sense. For example, the problems of maintaining a list of students taking a lecture course and of organising a dictionary structure in a compiler have much in common; both require the storage and manipulation of named things and the things have certain attributes or properties. Abstraction allows common solutions to seemingly different problems. By using abstract algorithms and data structures, a solution to a problem has maximum utility and scope for reuse.

Algorithms and data structures can be specified in any adequately precise language. English and other natural languages are satisfactory if used with care to avoid ambiguity but more precise mathematical languages and programming languages are generally preferred. The execution of the latter can also be automated. A program is the expression of some algorithm(s) and data structure(s) in a particular programming language. The program can be used on all types of computer for which suitable language compilers or interpreters exist, making it more valuable.

Sometimes details of a machine do intrude. The word size of the computer may be relevant if very big numbers are to be manipulated. The relative speed of real and integer arithmetic may be critical in simulating a physical system. There may be a trade-off between the speed of a computation and the space it uses. However such details should only be allowed to intrude if absolutely necessary.

Given a problem to solve, we select or design abstract data structures to store the data and algorithms to manipulate them. A data structure is an instance of a data type. The design step is largely independent of any programming language. The coding or programming step is to implement the abstract data structures and algorithms in a particular programming language such as Turing. An abstract data type defines certain static properties of the data and certain dynamic operations on them. It defines what the operations are, not how they work. An algorithm defines some process, how an operation works. It is specified in terms of simple steps. Each step must be clearly implementable in a mechanical way, perhaps by a single machine instruction, by a single programming language statement or by a previously defined algorithm. An algorithm to solve a problem must be correct; it must produce correct output when given valid input data. An algorithm must terminate when given valid input data. When the input data is invalid we usually want the algorithm to warn us and terminate gracefully. Often we want to know how much time an algorithm will take or how much space (computer memory) it will use and this leads to the analysis of algorithms and to the field of computational complexity.

Example: The Abstract Data Type Int

Mathematicians have long studied the integers and their properties. The consensus as to what integer-ness is amounts to a particular collection of axioms or rules that must be obeyed. The way in which integers are represented is unimportant provided only that all readers understand the notation - binary, octal, decimal, hexadecimal, twos complement, ones complement or sign and magnitude, the choice does not matter. What does matter is the way operations on integers behave. The rules define this behaviour, for example addition is commutative and zero is its identity value. Unary plus and minus take an integer and return an integer, denoted int->int. A binary operator like plus takes two integers and returns an integer, denoted int²->int.


Operations:
   +, - :int->int                unary
   +, -, ×, div, ** :int²->int   binary
   odd, even :int->boolean
   =, ~=, >= <=, >, <, :int²->boolean

Selection of rules, for all k, m, n :int
   0 :int
   1 :int
   n+m = m+n            + commutative
   n×m = m×n            × commutative
   n+0 = 0+n = n        + identity
   n×1 = 1×n = n        × identity
   k×(m+n) = k×m+k×n     × distributes over +
   n div m = k  where m×k <= n < m×(k+1),  if m>0

   even(0)
   even(n) if and only if odd(n+1)
   even(n) if and only if not odd(n)

   n < n+1
   k<m and m<n => k<n

The Abstract Data Type Int.

These rules define an ideal for a programming language to attempt to live up to. In practice, a language can only approximate this ideal. For example, many languages place a limit on the size of integers that can be represented in the type int; it is equivalent to:


type int = minint..maxint

Invariably this range is determined by the word size and for a typical 32 bit word and twos complement representation, maxint=2³¹-1 and minint=-2³¹. While the programmer must be aware of this fact it rarely causes difficulty. If it is necessary to manipulate integers outside this range, the programmer must implement the operations for his or her self.

Integer Multiplication

Long-multiplication, the usual manual method for multiplying large integers is an example of an algorithm. If the two integers have d1 and d2 digits respectively, the algorithm requires approximately d1×d2 steps.

    123
  × 456
-------
  49200     123×4 left shifted 2 columns
   6150     123×5 left shifted 1 column
 +  738     123×6
-------
  56088
-------

The HTML FORM below runs a demonstration of integer long-multiplication. Change the values, click the `go' button and experiment:

L
.
A
l
l
i
s
o
n

Try numbers with 6, 8, 10, 12 or more digits and note how the execution time increases, but only increase the number of digits gradually; you have been warned!

Fast Integer Multiplication

It is possible to multiply integers more rapidly than above. This is particularly important for very big integers, e.g., in encryption.

Given two 2n-bit integers, u and v:

u = 2ⁿU₁ + U₀, and v = 2ⁿV₁ + V₀

Note that

(U₁ - U₀)(V₁ - V₀)
= U₁V₁ - U₁V₀ - U₀V₁ + U₀V₀
= U₁V₁ + U₀V₀ - (U₁V₀ + U₀V₁)

Therefore,

U₁V₀ + U₀V₁
= U₁V₁ + U₀V₀ - (U₁ - U₀)(V₁ - V₀)

So the product uv can be expressed as:

uv = 2²ⁿ(U₁V₁) + 2ⁿ(U₁V₀ + U₀V₁) + U₀V₀
= 2²ⁿU₁V₁ + 2ⁿ(U₁V₁ + U₀V₀ - (U₁-U₀)(V₁-V₀)) + U₀V₀
= (2²ⁿ + 2ⁿ)U₁V₁ - 2ⁿ(U₁ - U₀)(V₁ - V₀) + (2ⁿ + 1)U₀V₀

i.e. uv can be found with some shifts (*2ⁿ etc.), additions, subtractions, and just three multiplications of half-length, n-bit integers:

U₁V₁
(U₁-U₀)(V₁-V₀)
U₀V₀

Using this recursively gives an O(n^log₂3)-time algorithm, better than O(n²); after:

D. E. Knuth. Seminumerical Algorithms. The Art of Computer Programming Vol. 2. Addison Wesley, 1969, 1981. Section 4.3.3 page 278.

Even Faster Integer Multiplication

Schonhage and Strassen showed how to multiply two N-digit integers in O(N log(N) log(log(N)))-time:

A. Schonhage & V. Strassen. Schnelle multiplikation grosser zahlen. Computing, 7, pp.281--292, 1971. See [more].

Example: The Abstract Data Type Set

The set is an example of a structured abstract data type.


Type:
   set of Element_Type

Constant:
   {}       the empty set

Operations:
   +, -, × :set² -> set
   =, <=, >=, :set² -> boolean
   in :Element_Type × set -> boolean

Rules, for x:Element_Type and
           s, t, u :set of Element_Type:
   x not in {}
   x in {x}
   x in s+{y} iff x=y or x in s

   x in s+t iff x in s or  x in t     --union
   x in s×t iff x in s and x in t     --intersection
   x in s-t iff x in s and x not in t --difference

   s×(t+u) = s×t + s×u                -- !

   s<=t iff x in s => x in t          --subset
   s<=t iff t>=s
   s=t  iff s<=t and t<=s

The Abstract Data Type Set.

Pascal, say, provides sets where the element type is an index type - int or enumerated types. The reason is that these are easily and efficiently implemented as bit-maps. It is not possible to have a set of alphanumeric words for example and of course each set must be finite.

The sieve of Eratosthenes is an algorithm that is over two thousand years old. It is used to calculate prime numbers between 2 and n. The algorithm uses a set of integers. Initially the set contains all integers between 2 and n. 2 is the first prime number and all multiples of 2 are removed from the set. The next prime number, 3, is found and all of its multiples are removed. This process of finding the next prime and removing its multiples is repeated until only prime numbers are left in the set. The primes can then be printed.

n=20

initial
  sieve: 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
prime=2        ^     ^     ^     ^     ^     ^     ^     ^     ^remove
         2  3     5     7     9    11    13    15    17    19
prime=3              ^        ^        ^        ^        ^
         2  3     5     7          11    13          17    19
prime=5                          ^              ^        ^
         2  3     5     7          11    13          17    19

final
  sieve: 2  3     5     7          11    13          17    19

Sieving Primes <= 20

Note that the algorithm was expressed in English. Each step is sufficiently simple that it is obvious how to carry it out without giving the matter very much thought. More precision is needed to express the algorithm in a programming language. The set data type is provided in some programming languages. The repetitive nature of the algorithm is implemented by loops or recursion:

[C/List/Sieve.c]

Exercises

Here is an informal description of an algorithm "by example":
If you know how to subtract and multiply small digits, eg. 2 and 3, you can use the algorithm to multiply large digits, eg. 7 and 8, as follows. Write down 7 and 8 side by side. Write 10-7=3 under the 7 and 10-8=2 under the 8. You should now have:
```
   7   8
   3   2
```
Take a diagonal pair (7 and 2 or 8 and 3 it does not matter). Subtract them, getting 5 and write it down. Now multiply the bottom two digits 2*3=6 and write the 6 after the 5. That is the answer: 7*8=56.
```
   7   8
   3   2   7-2=5; 3*2=6; Ans=56
```
Use the method to compute 6*9. Why does it work? Does it always work? Write down a formal and general description of the algorithm in English.
Can it be generalised to multiply larger numbers, eg. 89*97?
Give a piece of code to calculate the median, or middle value, of three distinct numbers x, y and z.
Implement large d-digit decimal integers as array 1..d of 0..9. Write routines to do input, output, +, - for this implementation.
(Medium) Add × to the previous exercise.
(Hard) Add div to the previous exercise.
Write a program to play Conway's "game of life".