a A PDC Scaffold

A PDC Scaffold

made visible

On Mou
on Divide and Conquer

For basic definitions (\(!\ \#\ :\ d\ c\ lr\ eo\ corr\ mirr\ atom?\ f_b\) etc.) see Tutorial.

As elsewhere we define

\(pdc = PDC(d,\ c,\ pre, post, baseq, basef)\)

so that now \(pdc\) is itself a function which means:

\(pdc\ =\ if\ atom?\ then\ f_b\\\)
\(\ \ \ \ \ \ \ \ \ \ \ \ \ else\ c\ :\ post\ :\ !\ pdc\ :\ pre\ :\ d \)

Below we apply \(pdc\) to an arbitrary input, making all its activities visible and hopefully its pattern of dependencies and capabilities clear. The steps are unrolled in a somewhat generic computation graph.

You can do a lot with just this much, by providing your own arbitrary local operation \(l\) to work on each (possibly communication-informed) element, along with your own base function \(f_b\) which transforms input-domain values into output-domain values in the simplest of cases.

And as below, a very typical communication is indeed \(\#\ !\ corr\), and a most typical base predicate and base function are \(atom?\) and \(id\).

Yes, as you may have noticed, there are extra copies of things around, that's because the operations can then be done separately in different processors in parallel.

Here the input has 8 positions, which implies log(8)=3 levels of division and combination and communication and re-application of \(l\). But the input could be arbitrarily large, an array of \(2^n\) positions with any finite \(n\), and the time complexity for the whole unboundedly-large parallel operation can be just \(log(n)\) time steps.

The \(corr\) communication pattern is made explicit using graphic links between elements in adjacent half-arrays. See how both ends of each link have the same index within adjacent half-arrays? That's the idea.
A PDC scaffold: \(lr, post, corr\)

\(pdc = PDC(d,\ c,\ pre, post, baseq, basef)\)

where:

\(d = d_{lr}, c = c_{lr}\)
\(baseq = atom?, basef = f_b \)
\(pre = id\) (makes this a post-adjust algorithm)
\(post\ =\ !\ l\ :\ \#\ !\ corr\)

Sample Input   0     1     2     3     4     5     6     7   first, divide
\(d_{lr}\) (0x) 0 1 2 3 4 5 6 7 atom?=false
\(d_{lr}\) (1x) 0 1 2 3 4 5 6 7 atom?=false
\(d_{lr}\) (2x) 0 1 2 3 4 5 6 7 atom?=true: stop, do \(f_b\)
\(f_b\) \(f\ \)0 \(f\ \)1 \(f\ \)2 \(f\ \)3 \(f\ \)4 \(f\ \)5 \(f\ \)6 \(f\ \)7 → output
data type

\(\#\ !\ corr\) (0x) \(f\)0.\(f\)1 \(f\)1.\(f\)0 \(f\)2.\(f\)3 \(f\)3.\(f\)2 \(f\)4.\(f\)5 \(f\)5.\(f\)4 \(f\)6.\(f\)7 \(f\)7.\(f\)6 C: Communicate
\(!l\) (0x) \(l_0\ 0\) \(l_0\ 1\) \(l_0\ 2\) \(l_0\ 3\) \(l_0\ 4\) \(l_0\ 5\) \(l_0\ 6\) \(l_0\ 7\) L: Local operation
\(c_{lr}\)(0x) \(l_0\ 0\) \(l_0\ 1\) \(l_0\ 2\) \(l_0\ 3\) \(l_0\ 4\) \(l_0\ 5\) \(l_0\ 6\) \(l_0\ 7\) S: Structure change

\(\#\ !\ corr\) (1x) \(l_0 0 . l_0 2\) \(l_0 1 . l_0 3\) \(l_0 2 . l_0 0\) \(l_0 3 . l_0 1\) \(l_0 4 . l_0 6\) \(l_0 5 . l_0 7\) \(l_0 6 . l_0 4\) \(l_0 7 . l_0 5\) C: each to+fro corr
\(!l\) (1x) \(l_1\ 0\) \(l_1\ 1\) \(l_1\ 2\) \(l_1\ 3\) \(l_1\ 4\) \(l_1\ 5\) \(l_1\ 6\) \(l_1\ 7\) L: \(l\)
\(c_{lr}\)(1x) \(l_1\ 0\) \(l_1\ 1\) \(l_1\ 2\) \(l_1\ 3\) \(l_1\ 4\) \(l_1\ 5\) \(l_1\ 6\) \(l_1\ 7\) S: \(c_{lr}\)

\(\#\ !\ corr\) (2x) \(l_1 0.l_1 4\) \(l_1 1.l_1 5\) \(l_1 2.l_1 6\) \(l_1 3.l_1 7\) \(l_1 4.l_1 0\) \(l_1 5.l_1 1\) \(l_1 6.l_1 2\) \(l_1 7.l_1 3\) each to+fro corr
\(!l\) (2x) \(l_2\ 0\) \(l_2\ 1\) \(l_2\ 2\) \(l_2\ 3\) \(l_2\ 4\) \(l_2\ 5\) \(l_2\ 6\) \(l_2\ 7\) calculate
\(c_{lr}\) (2x) \(l_2\ 0\) \(l_2\ 1\) \(l_2\ 2\) \(l_2\ 3\) \(l_2\ 4\) \(l_2\ 5\) \(l_2\ 6\) \(l_2\ 7\) All done

Let's consider \(!pdc : pre : d\) from the function definition. This applies right to left, thus \(d\) first, then \(pre\), then \(pdf\). After we first divide the provided input, then (since this is a post-adjust PDC with \(pre=id\)) do a no-op for the pre-adjustment, we see that \(!pdc\) distributes the PDC function itself, applying it separately to each of division outputs, the half-arrays produced by \(d\). The next (recursive) application of \(pdc\) starts by applying \(d\) to that input, doing a second division, which is followed again by no-op for preadjust and again by a third nested call to \(pdc\). You may infer though it is not explicitly shown in a single line that the postadjustments and combinations which come along with each level of call to \(pdc\) are waiting on the execution stack for their turn to come up. When finally the divisions yield atomic (length 1) arrays, then \(f_b\) is on the stack, followed by the lowest level \(post\), then the lowest level combination \(c\), followed by higher levels one after the other.

Hopefully by meditating on the code for a PDC, mentally reviewing the sequence of operations in the definition, and keeping the essentials in mind, you will be able to hold in memory that stack of execution promises, and thereby understand the global logic of how the PDC definition interleaves with itself, how the operations apply to the parts, how independent parts can be done in parallel. When you start to intuit how it works, you can write algorithms in a few lines in minutes that others write in ten or tens of pages taking weeks and months, if not forever.

The general form of the intuition is to consider Communication in some direction from one half to the other, then Local calculation or operation to make use of that communicated information, then a Structure manipulation which amounts to a rotation so that the halves communicating with each other are contrasted across a different direction.

For example send something from 0 to 1, rotate, then from 0 and 1 to 10 and 11, rotate then from 00 01 10 and 11 to 100 101 110 and 111, now in \(log(N)\) steps what was at \(0\) is at \(N\) positions. Right?

Rotate, communicate, calculate, repeat. With such an approach you could learn everything and conquer the world.

For baby examples, definitely check out how on this scaffold you can hang broadcast communication, reverse, etc., very easily, and with a little thought you can even do miracles with FFT (like achieve log(N) time complexity), as well as the impossible Fibonacci, etc. George doesn't like my FFT and Fib() yet, so please give me feedback.

In any case throughout there should be emphasis on trying to see the simplicity and obviousness of things, from some perspective or other, for when you do see these, which are its mathematical properties, you can prove algorithm correctness. And when *I* finish my homework, you can run it in a verifier and see that it actually works.

Footnote: Incidentally, the structure change can be seen as a rotation. If the sequential input array is conceptually transferred to the corresponding corners of a hypercube (here, 3D, so inputs 0 to seven go to corners 000 to 111) then each division cuts the hypercube in a different direction, establishing half-to-half correspondences across the most significant bit, then the second most significant, finally the least significant bit. The first division, for example, divides the elements numbered 0xx from 1xx into the low and high half, and places them into natural correspondence according to which elements 0xx and 1xx share an identical address suffix xx. In other words, division rotates the hypercube so that the correspondences are across dimension 2; another division rotates so that they correspondences are across dimension 1, while the last division is across dimension 0.

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2025 Thomas C. Veatch. All rights reserved.
Created: May 5, 2025