Mou's Black White Array

notes and an implementation


/**************************************************************************\
 *  Concepts and Algorithm from:                                          *
 *                                                                        *
 *    Black-White Array: A New Data Structure for Dynamic Data Sets       *
 *    arXiv:2004.09051v1 [cs.DS] 20 Apr 2020                              *
 *    by Z. George Mou                                                    *
 *    gmou5813@gmail.com                                                  *
 *                                                                        *
 *  C implementation and minor bug fixes                                  *
 *    by Thomas Veatch (c)2024                                            *
 *                                                                        *
\**************************************************************************/
A surprising proportion of the field of Theoretical Computer Science is simply sorting and storing: putting stuff in order and into data structures where things can quickly be inserted, found, and deleted. Sorting and Storing. Fast.

So it is a big deal when a new-and-better algorithm arises.

If all you do is learn George Mou's Black White Array algorithm, then you can skip half the book, unless you want to learn the worse ways, those that take up more space, and run more slowly.

Introduction:

Black White Arrays beat all. I quote the Conclusion of George's paper:

The black-white array (BWA) is presented as an array based data structure for dynamic data set(s) supporting effectively and efficiently the insert, search, and delete operations, with mathematically elegant structure, easy-to-understand code, and competitive performance against those based on linked structures such as lists and trees.

The performance data of a C++ implementation of the BWA were presented, which validated the formal analysis of the amortized costs of the operations. The nanosecond and low microsecond wall-clock time per operation suggests that the random access nature of the underlying array structure indeed can minimize the constant factor in front of the predicated asymptotic functions. When it comes to space utilization, BWA is far more efficient than any of those list or tree based approaches by a wide margin.

The versatility of the BWA structure also allows it to easily support augmented operations such as Max, Min, Interval, and IncrementalSort. Hence, it subsumes in functionality some other data structures and algorithms e.g. priority queue and mergesort. As an alternative to those well-known list or tree based data structures for dynamic data sets such as skip lists [17, 21] and red-black tree [1], BWA can be deployed in a broad range of applications.

In case George's paper is too concentrated, this page offers a more diluted, possibly more accessible introduction to the world of BWAs. Let us begin.

Bits of Storage

Every number has a base-2 representation which is a sum of powers of 2: its bits. For example, 5 is 2^2 + 2^0 = 4 + 1 = 0b101 . George Mou has developed the significance and applicability of this basic idea in the context of a data structure that lets you add items to and remove from it, while being able to search it (call it a Dynamic Data Store, a DDS).

Recall that in a binary number, each non-zero bit is not just a 1 shifted up so many bits, like a way of putting marks in columns on paper, but an actual power of two, as in 0x100 = 2^2 = 4. So it is a bit, a power of two, and a number. But in the Black White Array idea, the number that each bit is, is also a storage bin size and also an occupancy count. What is the idea? Every one item added, increases the total number by one. Sure. But we might not have noticed what is also obviously true: it also adjusts the bit representation of the total number. Going from 7 to 8 it extends the rank (highest bit) by one while zeroing the lower bits, as an example, counting from 0x111 to 0x1000 (a carry operation)

Now let's push this treatment from (a) the domain of COUNTING NUMBERS to (b) the domain of STORING a number of VALUES or (preferably sortable) data records. In the counting domain, we have bits with sizes which are powers of two, adding up to make some number. Number of what? Not a number OF anything; it's just a pure number. In the data storage domain, instead of bits we can have bins, with SIZES that are also powers of two, also laid out in a sequence, also indicating the number of elements stored. Number of what? A number of values in the store.

If each bit in a base-2 numbering system, up to some practical maximum, corresponds to a bin or segment or rank sized by that very power of 2 in storage, then for example in a 2^4-1=15 element sized DDS, there will bins of size 2^0=1, 2^1=2, 2^2=4, 2^3=8, (sizes suming up to 2^4-1).

Here's an example with 10 random numbers stored into a Black White Array sized to hold up to 2^4-1=15 elements. (Each bin's contents have been sorted in descending order.)

      Rank: |____________3__________|______2____|___1_|0_|XX|
     White: |48|44|42|41|40|20|17|10|  |  |  |  |29|14|  |XX|  
     Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
     Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
     
         V: | 8| 0| 2| 0|, total=10
See how the vector, V, of bin occupancies is similar to the binary value of the number total = 10 = 0x1010. In fact, 0x1000 = 8, and 0x0010 = 2, so V and total contain the same information, at least at this point. Now when initially filling up a Black White Array, the White bins or ranks are all either full or empty. Also, after an Insert is completed, the bins are stabilized to where the Black ranks are all empty.

But can we really maintain the remarkable proposition that every bin should be full or empty? No, because bins can have a minority of elements deleted from them before triggering a sweep-down operation ("demote()"). If values 10, 17, 20 were deleted from the above BWA then the total occupancy would be 7, but the occupancy vector would be V: | 5| 0| 2| 0|, which is different from the bit pattern of total=7=0x0111, and from the imaginary occupancy vector V: | 0| 4| 2| 1|. The demote() operation can at most sweep a half-empty bin into the next-lower bin to be sorted and put away; this depends on the next smaller Black bin having space for that larger half bin, which is indeed satisfied given the rule that the Black bins are always empty after each Insert() and Delete(). Yes, the Black bins must be entirely empty during, let's call them, Stable Configurations, after an Insert() or Delete() is finished. If only Inserting and never Deleting, then there is a perfect correspondence between a bit (a power of two) within the number of items stored, and a bin in the DDS which is packed full with exactly that many items. The binary digits in the total, in the No-Deletions condition, also tell us where the data elements are: if a bit is zero, then that bin is empty, and if a bit is one, then that bin is full. We will make this as true as possible, and it'll be amazing.

So we put the first value in, it goes into the zero'th bin, which was empty and is now full. We'll call it Active if there are values in a White array.

Now every time we put an added value in, then we have to do a very similar operation with this sequence of bins to what we would do by counting in binary using powers of 2, when all the bits are filled and we add one more, then we have to CARRY.

Consider 0x1 and 0x1, how do we add them? 1+1=2=0x10, but of course now the first bit has become zero, and the second bit now represents (or contains) both. A doubled quantity has carried to the next binary digit.

Similarly in the storage domain, where Mou proposes to have a Black bin, a spare bin, one for every size except the last.

If we have a full Black bin and a non-empty White bin, then this level is too small to hold them all, so they need to move up to the next level (at least! this may be triggered recursively!). But since the next level's Black bin starts empty and is twice as big as this level's bins, we can always use it to hold both Black AND White values from the current level's bins. Indeed, we actually merge-sort them from Black and White bins at the smaller level all together into the best available empty bin at the next level (which might also be the White bin, if that happens to be empty), and then not only have we emptied the lower level by merging up to the next, but we have created a fully sorted bin there.

This whole merge-up operation corresponds to the bit-carry operation in adding to a binary number.

If we have to carry across multiple "1" bits (full bins) at a time, it's fine; we have doubled-in-size bins at the next level to push into. With a newly inserted value, the counts of the carrying bins add up to the size of a larger-and-empty bin that will recieve them. So we carry, but not merely by COUNTING one higher bit, but by MOVING full bins, which is just what corresponds to counting in binary in the domain of storage using these bins.

On the one hand, I think this is genius, because the identity of a bit, on the one hand, with a bin of capacity equal to that bit's bit-shifted value, on the other hand, is BOTH just mind-boggling, and completely obvious when we think about it. 0x01, 0x10, and 0x100 each of them is a single bit, but one is 1, one is 2, and one is 4, and we can keep going. If we use this as an organizing principle for a set of sorting bins for a sorted storage data structure, out falls the Black White Array algorithm.

Again, both our Black and White bins are full, and we can move them together into the next larger, twice-as-large bin in our sequence, which then becomes full, and leaves empty bins behind.

This is the concept in the Black White Array, except that for fast search, we also sort everything as it goes in and at each level as Black and White bins carry upward the larger bins, every transfer has its associated merge-sort, so that all the bins are always separately sorted. This also makes search fast even though you may have to search multiple bins for a value; you can start with the biggest ones and get most search targets in log N binary search.

To summarize, we keep spares for the carries, and you have to use occupancy counts not bits in the total count to manage deletions but now you've got the basic idea.

Code

To get the ball rolling here, I wrote (in C) working small versions the Insert, Search, and Delete functions for a BWA along with pretty-printing so you can watch the data move around. Feel free to download bwa.c and bwa.h from here, if you will send modifications to me for incorporation, we can make a nice open source project of this. I'm compiling on GNU/Linux on an Ubuntu machine, as follows:
% gcc bwa.c -o bwa
Here is a sample run with output. Notice how the counts in V track with the integer values of the bits in the variable "total".
% ./bwa
---------------------------------------------------------
--------------------Black White Array--------------------
--------------------  demonstration  --------------------
---------------------------------------------------------
Generate 15 random #s to insert (<53, RAND_MAX 2147483647):
#0=12 #1=8 #2=22 #3=26 #4=28 #5=5 #6=40 #7=39 
#8=41 #9=32 #10=52 #11=23 #12=2 #13=24 #14=35 
--------------------------------------------------------
Start with this empty BWA: 
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |XX|  V:  0| 0| 0| 0|, total=0
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
--------------------------------------------------------
Now do 15 Inserts: 
Insert(12)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |  |  |12|XX|  V:  0| 0| 0| 1|, total=1
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(8)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |12| 8|  |XX|  V:  0| 0| 2| 0|, total=2
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(22)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |12| 8|22|XX|  V:  0| 0| 2| 1|, total=3
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(26)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |26|22|12| 8|  |  |  |XX|  V:  0| 4| 0| 0|, total=4
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(28)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |26|22|12| 8|  |  |28|XX|  V:  0| 4| 0| 1|, total=5
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(5)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |26|22|12| 8|28| 5|  |XX|  V:  0| 4| 2| 0|, total=6
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(40)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |26|22|12| 8|28| 5|40|XX|  V:  0| 4| 2| 1|, total=7
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(39)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|  |  |  |  |  |  |  |XX|  V:  8| 0| 0| 0|, total=8
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(41)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|  |  |  |  |  |  |41|XX|  V:  8| 0| 0| 1|, total=9
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(32)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|  |  |  |  |41|32|  |XX|  V:  8| 0| 2| 0|, total=10
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(52)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|  |  |  |  |41|32|52|XX|  V:  8| 0| 2| 1|, total=11
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(23)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|41|32|23|  |  |  |XX|  V:  8| 4| 0| 0|, total=12
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(2)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|41|32|23|  |  | 2|XX|  V:  8| 4| 0| 1|, total=13
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(24)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|41|32|23|24| 2|  |XX|  V:  8| 4| 2| 0|, total=14
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
Insert(35)
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|41|32|23|24| 2|35|XX|  V:  8| 4| 2| 1|, total=15
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

Done Inserts
-----------------------------------------------------
Now do 15 Searches: 
                       12                              bin 3, val 12, at 10
                           8                           bin 3, val 8, at  9
                    22                                 bin 3, val 22, at 11
                 26                                    bin 3, val 26, at 12
              28                                       bin 3, val 28, at 13
                              5                        bin 3, val 5, at  8
        40                                             bin 3, val 40, at 15
           39                                          bin 3, val 39, at 14
                                   41                  bin 2, val 41, at  6
                                      32               bin 2, val 32, at  5
                                52                     bin 2, val 52, at  7
                                         23            bin 2, val 23, at  4
                                                2      bin 1, val 2, at  2
                                            24         bin 1, val 24, at  3
                                                  35   bin 0, val 35, at  1
-----------------------------------------------------

Now do 15 Deletes from this BWA: 
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|41|32|23|24| 2|35|XX|  V:  8| 4| 2| 1|, total=15
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
randomized indices: |41|24|22| 8| 5| 0|40|32| 2|39|52|23|35|26|12|28|
                                   41                  bin 2, val 41, at  6
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|  |32|23|24| 2|35|XX|  V:  8| 3| 2| 1|, total=14
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
0'th Delete done (value 41), total=14
                                            24         bin 1, val 24, at  3
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|22|12| 8| 5|52|  |32|23|35| 2|  |XX|  V:  8| 3| 2| 0|, total=13
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
1'th Delete done (value 24), total=13
                    22                                 bin 3, val 22, at 11
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|  |12| 8| 5|52|  |32|23|35| 2|  |XX|  V:  7| 3| 2| 0|, total=12
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
2'th Delete done (value 22), total=12
                           8                           bin 3, val 8, at  9
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|  |12|  | 5|52|  |32|23|35| 2|  |XX|  V:  6| 3| 2| 0|, total=11
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
3'th Delete done (value 8), total=11
                              5                        bin 3, val 5, at  8
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |40|39|28|26|  |12|  |  |52|  |32|23|35| 2|  |XX|  V:  5| 3| 2| 0|, total=10
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
4'th Delete done (value 5), total=10
                                                       bin -1, val 0, at -1
Delete: 0 not found.
5'th Delete done (value 0), total=10
        40                                             bin 3, val 40, at 15
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |52|39|  |32|28|26|23|12|  |  |  |  |35| 2|  |XX|  V:  7| 0| 2| 0|, total=9
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
6'th Delete done (value 40), total=9
                 32                                    bin 3, val 32, at 12
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |52|39|  |  |28|26|23|12|  |  |  |  |35| 2|  |XX|  V:  6| 0| 2| 0|, total=8
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
7'th Delete done (value 32), total=8
                                                2      bin 1, val 2, at  2
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |52|39|  |  |28|26|23|12|  |  |  |  |  |  |35|XX|  V:  6| 0| 0| 1|, total=7
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
8'th Delete done (value 2), total=7
           39                                          bin 3, val 39, at 14
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |52|  |  |  |28|26|23|12|  |  |  |  |  |  |35|XX|  V:  5| 0| 0| 1|, total=6
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
9'th Delete done (value 39), total=6
        52                                             bin 3, val 52, at 15
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |28|26|23|12|  |  |35|XX|  V:  0| 4| 0| 1|, total=5
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
10'th Delete done (value 52), total=5
                                      23               bin 2, val 23, at  5
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |28|26|  |12|  |  |35|XX|  V:  0| 3| 0| 1|, total=4
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
11'th Delete done (value 23), total=4
                                                  35   bin 0, val 35, at  1
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |28|26|  |12|  |  |  |XX|  V:  0| 3| 0| 0|, total=3
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
12'th Delete done (value 35), total=3
                                   26                  bin 2, val 26, at  6
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |28|12|  |XX|  V:  0| 0| 2| 0|, total=2
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
13'th Delete done (value 26), total=2
                                               12      bin 1, val 12, at  2
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |  |  |28|XX|  V:  0| 0| 0| 1|, total=1
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
14'th Delete done (value 12), total=1
                                                  28   bin 0, val 28, at  1
 Rank: |____________3__________|______2____|___1_|0_|XX|
White: |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |XX|  V:  0| 0| 0| 0|, total=0
Black:  XX XX XX XX XX XX XX XX|  |  |  |  |  |  |  |XX|
Index:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
15'th Delete done (value 28), total=0
----------Black-White Array demonstration complete----------
Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024-2025 Thomas C. Veatch. All rights reserved.
Created: July 14, 2024; Modified: September 17, 2023