Dynamic Programming 

1D Dynamic Programming AlgorithmThe 1D dynamic programming problem is:
Given vertices,
v_{lo}, v_{lo+1}, ..., v_{hi1}, v_{hi},
directed edges (v_{i},v_{j})
for all i and j where lo≤i<j≤hi, and
an edge cost function, This is a special case of the shortestpath problem; here the graph is a complete, directed acyclic graph as determined by the ordering on vertices. It can therefore be solved by a variation on the dynamic programming strategy which is also the basis of Prim's minimum spanning tree algorithm and Dijkstra's singlesource shortest paths algorithm. The algorithm maintains paths from v_{lo} to a set of "undone" vertices, initially containing v_{lo+1}, ..., v_{hi}; it also keeps a set of "done" vertices, initially empty. The starting path from v_{lo} to v_{i} consists of the single edge (v_{lo},v_{i}) and is unlikely to be optimal. The algorithm repeatedly finds the best undone vertex, v_{b}, that is closest to v_{lo} and moves it to the "done" set. v_{b} may also reveal shorter paths from v_{lo} via v_{b} to remaining undone vertices. Enough information is kept to trace an optimal path backwards from v_{hi} to v_{lo} when v_{hi} becomes done.
This problem appears in various disguises, for example in inserting line breaks for paragraph layout and in polygon fitting. It also appears in the inference and datacompression problem (Farr & Wallace 2002) described in the next section. SMML CodeBook for the 2State (~Binomial) DistributionA transmitter and a receiver want to share the results of n Bernouilli trials, i.e. n throws of a coin, carried out by the transmitter, by sending them over a data link. They could use a fixed code, e.g. based on pr(H)=0.5 in which case the data would be sent at 1bit per throw. But they might be able to do better than this if the coin is biased. The transmitter, who is the one making the trials, is in a position to estimate θ=pr(H) and to send the results using a code based on θ, but the receiver cannot decode such data unless told the estimate in a header to the message. The transmitter and receiver need a "codebook" for (the parameter of) the 2state distribution. Just considering the number of heads, #H, that could turn up, this quantity lies in [0..n]; there are n+1 possible values for #H. It is clearly impossible to infer pr(H) more precisely than something like ±1/(2×(n+1)) and it turns out that a sensible precision is much coarser than this. The codebook contains a partition of [0..n] (which implies a partition of [0.0..1.0]). The transmitter conducts the trials and selects the entry for the part of the partition into which #H falls. The message header is the codeword for that part. The transmitter then transmits the trial results using the code for {H,T} based on the estimate corresponding to the part. The receiver now has the estimate and can decode the results of the trials. The 1D DPA can be used to partition [0..n] optimally(F&W 2002). It is convenient to set lo=1 and hi=n. This example illustrates the correspondence between a path and a partition:
The cost of an edge (i,j), i.e. part [i+1..j], is the expected message length over those outcomes containing more than i and at most j heads. A prior probability distribution on θ is needed to complete the calculation; the uniform distribution over [0.0..1.0] is used below for simplicity.
A simple main program runs the algorithm through its paces:
E.g.
Farr and Wallace (2002) show that the SMML codebook problem is feasible, that is computable in polynomial time, for probability distributions having 1D discrete parameter spaces. "The problem in general is [...] NPhard." "The complexity of the trinomial [3state] case remains open" but they give a good heuristic for it; the trinomial problem has two parameters, θ_{0}=pr(0) and θ_{1}=pr(1), pr(2)=1θ_{0}θ_{1}. Approximate CodeBook for the Normal (Gaussian) DistributionThe normal distribution causes us two difficulties: Its parameter space, (μ,σ), is two dimensional and both parameters are continuous. We can appoximate its codebook by considering μσ and by quantising the range of σ. This is to make two simplifying assumptions: An approximate codebook entry corresponds to a rectangle in parameter space which is suboptimal in general, and σ is quantised. μ is an "origin" parameter: There is no nett effect on the likelihood if μ and all data are increased by the same value x. σ is a "scale" parameter: If μ=0 (see above), there is no nett effect on the likelihood if σ and all data are multiplied by the same value x>0. Taking advantage of these facts, σ quanta are chosen so that their log values are equally spaced and, for a given σ interval, the range of μ is divided into some number of equal sized intervals. The expected "cost" of using one estimate N_{mEst,sEst} per "patch" (mLo,mHi)×(sLo,sHi) is the cost of sending the estimate plus the expected KLdistance from the (untransmittable) maximum likelihood estimate to it. This expected KLdistance be calculated by integration of the KLdistance from N_{maxLH} to N_{mEst,sEst} over the patch. Making the patches smaller reduces this cost to the data but it also increases the cost of stating the estimates, giving a tradeoff to be optimised.
E.g. n=30; μ 0.0..2.0; σ 0.1..1.0 (100 quanta):
References


↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso88591, fetched Wednesday, 08Feb2023 07:43:16 UTC. Free: Linux, Ubuntu operatingsys, OpenOffice officesuite, The GIMP ~photoshop, Firefox webbrowser, FlashBlock flash on/off. 