Dynamic Programming |
|
1-D Dynamic Programming AlgorithmThe 1-D dynamic programming problem is:
Given vertices,
vlo, vlo+1, ..., vhi-1, vhi,
directed edges (vi,vj)
for all i and j where lo≤i<j≤hi, and
an edge cost function, This is a special case of the shortest-path problem; here the graph is a complete, directed acyclic graph as determined by the ordering on vertices. It can therefore be solved by a variation on the dynamic programming strategy which is also the basis of Prim's minimum spanning tree algorithm and Dijkstra's single-source shortest paths algorithm. The algorithm maintains paths from vlo to a set of "undone" vertices, initially containing vlo+1, ..., vhi; it also keeps a set of "done" vertices, initially empty. The starting path from vlo to vi consists of the single edge (vlo,vi) and is unlikely to be optimal. The algorithm repeatedly finds the best undone vertex, vb, that is closest to vlo and moves it to the "done" set. vb may also reveal shorter paths from vlo via vb to remaining undone vertices. Enough information is kept to trace an optimal path backwards from vhi to vlo when vhi becomes done.
This problem appears in various disguises, for example in inserting line breaks for paragraph layout and in polygon fitting. It also appears in the inference and data-compression problem (Farr & Wallace 2002) described in the next section. SMML Code-Book for the 2-State (~Binomial) DistributionA transmitter and a receiver want to share the results of n Bernouilli trials, i.e. n throws of a coin, carried out by the transmitter, by sending them over a data link. They could use a fixed code, e.g. based on pr(H)=0.5 in which case the data would be sent at 1-bit per throw. But they might be able to do better than this if the coin is biased. The transmitter, who is the one making the trials, is in a position to estimate θ=pr(H) and to send the results using a code based on θ, but the receiver cannot decode such data unless told the estimate in a header to the message. The transmitter and receiver need a "code-book" for (the parameter of) the 2-state distribution. Just considering the number of heads, #H, that could turn up, this quantity lies in [0..n]; there are n+1 possible values for #H. It is clearly impossible to infer pr(H) more precisely than something like ±1/(2×(n+1)) and it turns out that a sensible precision is much coarser than this. The code-book contains a partition of [0..n] (which implies a partition of [0.0..1.0]). The transmitter conducts the trials and selects the entry for the part of the partition into which #H falls. The message header is the code-word for that part. The transmitter then transmits the trial results using the code for {H,T} based on the estimate corresponding to the part. The receiver now has the estimate and can decode the results of the trials. The 1-D DPA can be used to partition [0..n] optimally(F&W 2002). It is convenient to set lo=-1 and hi=n. This example illustrates the correspondence between a path and a partition:
The cost of an edge (i,j), i.e. part [i+1..j], is the expected message length over those outcomes containing more than i and at most j heads. A prior probability distribution on θ is needed to complete the calculation; the uniform distribution over [0.0..1.0] is used below for simplicity.
A simple main program runs the algorithm through its paces:
E.g.
Farr and Wallace (2002) show that the SMML code-book problem is feasible, that is computable in polynomial time, for probability distributions having 1-D discrete parameter spaces. "The problem in general is [...] NP-hard." "The complexity of the trinomial [3-state] case remains open" but they give a good heuristic for it; the trinomial problem has two parameters, θ0=pr(0) and θ1=pr(1), pr(2)=1-θ0-θ1. Approximate Code-Book for the Normal (Gaussian) DistributionThe normal distribution causes us two difficulties: Its parameter space, (μ,σ), is two dimensional and both parameters are continuous. We can appoximate its code-book by considering μ|σ and by quantising the range of σ. This is to make two simplifying assumptions: An approximate code-book entry corresponds to a rectangle in parameter space which is suboptimal in general, and σ is quantised. μ is an "origin" parameter: There is no nett effect on the likelihood if μ and all data are increased by the same value x. σ is a "scale" parameter: If μ=0 (see above), there is no nett effect on the likelihood if σ and all data are multiplied by the same value x>0. Taking advantage of these facts, σ quanta are chosen so that their log values are equally spaced and, for a given σ interval, the range of μ is divided into some number of equal sized intervals. The expected "cost" of using one estimate NmEst,sEst per "patch" (mLo,mHi)×(sLo,sHi) is the cost of sending the estimate plus the expected KL-distance from the (untransmittable) maximum likelihood estimate to it. This expected KL-distance be calculated by integration of the KL-distance from NmaxLH to NmEst,sEst over the patch. Making the patches smaller reduces this cost to the data but it also increases the cost of stating the estimates, giving a trade-off to be optimised.
E.g. n=30; μ 0.0..2.0; σ 0.1..1.0 (100 quanta):
References
|
|
↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso-8859-1, fetched Wednesday, 08-May-2024 04:40:12 UTC. Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off. |