Lists |
|
The list is a flexible abstract data type. It is particularly useful when the number of elements to be stored is not known before running a program or can change during running. It is well suited to sequential processing of elements because the next element of a list is readily accessible from the current element. It is less suitable for random access to elements as this requires a slow linear search. Two forms of list are considered, flat lists of elements in this chapter and hierarchical lists in a later chapter.
The definition of list can be read as follows: a list of e is (=) either `nil' or (|) is `cons' applied to an element of type e and a list. Nil represents the empty list. Null tests for the empty list. The head of a list is the first element in the list. The tail of a list is everything but the first element and is a list not an element. Cons constructs a list given an element and a list.
Given the basic operations on lists, a number of useful functions can be defined.
The merge function produces one sorted list when given two sorted lists.
The map function applies a function f to every element of a list and produces a new list. In this way map f processes a single data structure as a whole.
The reduce function inserts a binary function, or operator, between every element in a list. If the list is empty, reduce returns an identity or zero element z.
As always, the issue of efficiency must be attended to. For example, there are various ways to reverse a list. The simplest way but not the best is:
A fast list-reversal algorithm takes O(n) time and uses the accumulating parameter technique.
Note that although there are infinitely many possible lists there are really only two types of list - empty ones and non-empty ones. Almost every function that manipulates a list must deal with these two cases.
Implementation of Flat ListsThe most natural way to implement lists is by means of records/structs and pointers. They can also be implemented by arrays if necessary. Flat Lists by Records and PointersThe list type is defined as a pointer to a record (structure). The basic operations manipulate the pointers: Note that cons is implemented by a procedure in Turing and not by a function because it manipulates a `collection' variable and functions are not allowed to change variables in that language. There is no such problem in Algol68, C, Java or Pascal, say.The extended operations on lists that were described previously are easily defined. If the implementation of lists by pointers can be assumed, it is more efficient, if marginally less clear, to use in-line code such as L=nil(List) in place of null(L) and similarly for the other basic operations.Beware: do not use `length(L)>0' when `not null(L)' is adequate; the former takes O(|L|) time, the latter O(1). The list reversal functions can be coded as follows: Some languages such as C do not support nested, local subroutines so the fast, O(|L|) list reversal routine must be coded by "lifting" the nested routines out, as follows: If the element type of a list can be printed by a library routine, a list can be printed by repeated calls. There are three cases to consider. The empty list should appear as `[ ]', a list of a single element should appear as `[e]' and a list of two or more elements should have the elements separated by commas `[a,b,...]'. The empty list can be treated as a special case by an if statement. The last two cases can be dealt with by using a loop and placing the printing of the current element before the exit and the printing of the comma after the exit. In this way one fewer commas than elements are printed. Flat Lists by ArraysA list can be implemented by an array of records. Rather than using a pointer to indicate the next element, an integer is used to hold the index of the next element. Some convention, such as the use of zero, indicates the null list. This implementation has the disadvantage that an array of the maximum size of the list must be declared even if the list(s) grow and shrink. This method is useful in a language without dynamic storage. The Sieve of Eratosthenes and Sets by ListsThe Sieve of Eratosthenes algorithm is based on a set of integers. Some programming languages that provide a built in `set' data-type restrict the range of elements of a set, e.g. to 32 or 64, so that they can be implemented by bit-masks and bit-wise operations. However, a set can be implemented by a list of members. In the sieve, composite numbers are removed from the set until only primes remain. To calculate all primes less than or equal to n, the sieve is initialised to the set 2..n or the list [2, 3, ...,n]. The routine `range' produces the list [lo, lo+1,..., hi]. A filter routine can be written to remove all multiples of a number P from a set. A sieve routine can then be written to remove all composites from a set. Sieve requires the first member of the set to be a prime (2 is prime). It calls filter to remove all multiples of this prime. At this stage the second number in the set must also be prime and it calls itself recursively to sieve the remaining (tail) elements of the set. When the process terminates, only prime numbers remain in the set.Note that lists are fast for sequential access to elements but slow for random access. Since Eratosthenes' algorithm takes large strides through the set, the above implementation is slower than that given in the introductory chapter. Recursion and IterationMany of the operations on lists and other recursive data types are naturally expressed as recursive routines. Recursive routines have fewer state variables than iterative routines and are frequently easier to prove correct. However recursion does require system-stack space to operate. Iterative versions of many routines, particularly for the simpler operations, are straightforward. Note that we saw an iterative routine to print a list. This is not the place to discuss the pros and cons of recursion. Suffice it to say that recursion and iteration are both powerful and useful techniques.Side EffectsThe list operations defined so far are free of side-effects except to the extent that they perform input and output or reserve dynamic space. With pointers it is easy to create two or more aliases for one variable or area of storage. Assignment via one alias will affect the other implicitly. This can be very useful indeed on efficiency grounds - when done intentionally. It is also easy to cause this sort of side-effect unintentionally which is a frequent source of program bugs. These bugs can be very hard to track down because the symptoms often show up far from the original cause. Despite the dire warnings, the use of side-effects can sometimes lead to more efficient programs. In some circumstances it is also relatively easy to be assured that they can be used safely. For example, there may only be two list variables in use and it may be obvious that they cannot share cells. The cons operation adds an element to the front of a list. (If elements are being added to and deleted from a single list, that can be done, but as a side-effecty.) These operations can also be used to insert and delete internal cells of a list.
A good example of a side-effect involves list reversal. The previous versions made reversed copies of a list. It is also possible to reverse a list in-situ by changing the pointers in its cells. The pointer in each of the input list must be made to point to the previous cell rather than the next cell. The pointer in the first cell should become nil. At least three pointers are needed to step along the list: L, Next and Next2.
GarbageRepeated allocation of dynamic space by new may cause all available storage to be used up. It is often the case that as new dynamic structures are created old structures are discarded or become garbage. If they are not recycled the program may come to a halt prematurely. Some languages provide a system garbage collector to do this recycling as necessary but it is an expensive operation. Instead, the free statement is used to return dynamic storage for reuse. This is of course a special kind of side-effect and unless a language processor carries out checks for aliases, at some cost, its use may cause program bugs unless care is taken. In particular part of a structure must not be freed if it is shared with a second structure - consider append. Free only returns one unit of dynamic storage. To free a complete structure it must be traversed and each cell freed. Testing and DebuggingIf all goes well this section will be unnecessary but the manipulation of pointers is a fruitful area for program bugs. The use of pre- and post-conditions and other assertions which can be tested when the program is being developed, and commented out or turned-off in production, is a good idea. It is useful to write a list output routine as one of your first list routines. This enables tracing statements to be inserted to print out the values of list variables if and when a program fails. Routines on lists can be easily tested or debugged in isolation especially if they are written in a functional style, free of side-effects. Recall that almost all routines must contain this pattern:
If routines are free of side-effects and are individually correct they can usually be combined with great confidence. It is the presence of side-effects, especially the manipulation of global variables, that causes surprising interactions and bugs. The most common errors when using pointers are:
Exercises
© L_Allison
|
|
↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso-8859-1, fetched Sunday, 26-Mar-2023 03:29:47 UTC. Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off. |