Table of Contents
This is a continuation of the "CA on a GPU" theme. This time, I'm going to describe Margolus variation of CA neighborhoods, quite a different one from the Moore neighborhood. I will start by introducing the Margolus neighborhood (MN), then implement a generic part of an ARB program for MN CA emulation and talk about a couple of debugging techniques.
Last, two cellular automata will be considered, a simple colliding gas simulation (TM GAS), and "Critters" CA, having (probably) rather obscure physical meaning, discussion of which we are going to postpone for now.
As I will skip the description of general structure of OpenGL "wrapper" program, it might be useful to read the first post beforehand for the sake of comprehension.
Contrary to the Moore neighborhood, where rules specify what transformations each cell undergoes depending on its own state and the states of its neighbors, the Margolus neighborhood operates on 4cells blocks, which I'll denote as quads here. That is, for every possible state of 2x2 block (a quad) of cells, there is a corresponding state, which the quad should take on the next time step. Plus, on every other step, lattice partitioning into quads is shifted to 1 cell both horizontally and vertically, lest all the quads would evolute independently of each other.
In the picture above, the same initial state of CA is shown, where only two cells are in "active" state. If we
choose the partitioning scheme on the left, we would have to apply 3 rules to calculate next CA state: a rule for
an empty quad (a lot of times), and 2 rules for a quad's state with only 1 active cell (one in upright corner,
another in upleft corner). If, by contrast, we choose the partitioning on the right, then both active cells are
in the same quad, and we would apply only 2 rules: again, the one for the empty quad, and another one for the
quad with 2 active cells on the lower row.
Irrespectively of initial partitioning chosen, CA simulation requires that we alternate between them on every step when simulating.
Since any shader program operates on pixels, not quads, we need to be able to know where in the quad the current pixel is situated.
Let's try to set up an ARB temp variable 'quad'
, in such a way that it has only
one of its four components set to 1
, denoting the position of the current pixel in
the quad. Let's define the following correspondence between XYZW
components of
4vector and quad's cells (blue corners shows alternating oddsteps quad partitioning):
We will start with determining if X coordinate of the pixel is odd or even, do the same for Y coordinate, and store the result in some ARB temp variables:
Example 1. Odd/even testing for X and Y axes
TEMP shift; # remember, we pass 1/TEX_WIDTH as a parameter to the program MOV shift, program.env[1]; # x  positive, z  negative # get fragment position in the quad TEMP mod_pos; RCP mod_pos, shift.x; # mod_pos = (1/shift.x, 1/shift.x, ...) MUL mod_pos, fragment.texcoord, mod_pos; # mod_pos = (x/shift.x, y/shift.x, ...) # since fragment.texcoord is (step/2+step*N), where N is natural number, SUB mod_pos, mod_pos, 0.5; # we need to subtract 0.5 for the first term MUL mod_pos, mod_pos, 0.5; # and finally divide by 2 to get either natural M or M+0.5 FRC mod_pos, mod_pos; # get fraction  mod_pos.xy are now either ~0 or ~0.5 ADD mod_pos, mod_pos, 0.25; # mod_pos.xy are either ~0.25 or ~0.75 TEMP quad_x; # y z 0 1 1 0 SGE quad_x.zw, mod_pos.xxxx, 0.5; #    SLT quad_x.xy, mod_pos.xxxx, 0.5; # x w 0 1 1 0 # x>0.5 x<0.5 TEMP quad_y; # y z 1 1 0 0 SGE quad_y.yz, mod_pos.yyyy, 0.5; #    SLT quad_y.xw, mod_pos.yyyy, 0.5; # x w 0 0 1 1 # y>0.5 y<0.5
At this point, quad_x
variable has its left "column" set to 1
s when X
coordinate of the pixel is even, and right column, when it's
odd. Same for quad_y
, but "rows" are being set up instead of "columns". All we have
left to do is to multiply them and adjust for odd steps, where we have to shift the
partitioning:
Example 2. Finalizing cell's position in the quad
TEMP quad; MUL quad, quad_x, quad_y; # quad contains one and only one component set to 1 # shift the quad for odd steps CMP quad.xyzw, program.env[0], quad.zwxy, quad.xyzw;
This CMP
operation above kicks in only when program.env[0]>0
, and we surely set it to be 1
only for odd
steps of the simulation (example on Java/jogl):
int _flip = 1; float _pr_param[][] = new float[][] { { 0, 0, 0, 0 }, { 1, 1, 1, 1 } }; gl.glProgramEnvParameter4fvARB(GL.GL_FRAGMENT_PROGRAM_ARB, 0, _pr_param[_flip = 1  _flip], 0);
For odd steps, we are swapping X<>Z
, Y<>W
components of the quad. Indeed, it's clear from the picture, that, for example,
Y
(or upleft) position in the quad is, in fact, W
(or
downright) position in the shifted quad.
One might wonder how can we check that we got everything right in the ARB code from the previous section.
After all, there is quite a lot of ARB code, doing lots of shady operations with fractional values, trying to
get exact 0
s and 1
s in the end...
The answer is the following approach, roughly equivalent to the so called "printf debugging" in other languages:
Example 3. Testing pixel quad positioning calculation
!!ARBfp1.0 # quad variable calculation goes here, skipped... TEMP check; # checking that quad is correct MOV check, 0; CMP check.r, quad.x, 1, check; CMP check.g, quad.y, 1, check; CMP check.b, quad.z, 1, check; CMP check.a, quad.w, 1, check; # check is red/gree/blue/black for DL/UL/UR/DR pixel position correspondingly MOV result.color, check; END
This code will result in drawing of different patterns for odd and even steps, both consisting of only red,
green, blue, and black colors. If there is a bug in 'quad' variable calculation (like, several components are
set to 1
s instead of only one), the rendered picture could contain more colors than
these four.
Now, to implement the logic of any Margolus neighborhood CA, we need to know the states of the current cell
(obtained trivially via
instruction), and states of
other three neighbors in the current quad. Let's say we have TEX
R
component encoding the
state of the cell, and try to define a 4vector variable 'quadv'
(short for "quad
values") holding the significant R
component of all quad member cells as its
XYZW
components:
First, let's calculate two temporary variables, 'vert'
and 'horiz'
, holding significant components of all 8 Moore
neighbors as theirs XYZW
components. I will divide 8 neighbors to two 4vectors based
on the following scheme:
Here is the ARB code to get Moore neighbors and store them in two 4vectors:
Example 4. Calculation of 'horiz' and 'vert' temporary variables
TEMP cur_pos; # remember, shift.x is positive shift, shift.z  negative TEMP cur_clr; # let R component be significant #  z w TEMP horiz; #  ADD cur_pos, fragment.texcoord, shift.zzww; # 1, 1 #  c TEX cur_clr, cur_pos, texture, 2D; #  MOV horiz.x, cur_clr.r; # x y ADD cur_pos, fragment.texcoord, shift.wzww; # 0, 1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.y, cur_clr.r; ADD cur_pos, fragment.texcoord, shift.wxww; # 0, +1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.z, cur_clr.r; ADD cur_pos, fragment.texcoord, shift.xxww; # +1, +1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.w, cur_clr.r; # x  TEMP vert; #  ADD cur_pos, fragment.texcoord, shift.zxww; # 1, +1 # y c z TEX cur_clr, cur_pos, texture, 2D; #  MOV vert.x, cur_clr.r; #   w ADD cur_pos, fragment.texcoord, shift.zwww; # 1, 0 TEX cur_clr, cur_pos, texture, 2D; MOV vert.y, cur_clr.r; ADD cur_pos, fragment.texcoord, shift.xwww; # +1, 0 TEX cur_clr, cur_pos, texture, 2D; MOV vert.z, cur_clr.r; ADD cur_pos, fragment.texcoord, shift.xzww; # +1, 1 TEX cur_clr, cur_pos, texture, 2D; MOV vert.w, cur_clr.r;
Second, as soon as we have 'horiz'
and 'vert'
calculated, it should be easy to carefully select correct components (based on known position of the
current cell in the quad) and store them as 'quadv'
:
Example 5. Finalizing 'quadv' calculation
# quadv variable holds all 4 values of the current quad TEMP quadv; TEX cur_clr, fragment.texcoord, texture, 2D; MOV quadv.xyzw, cur_clr.rrrr; # first, fill all quadv components with the current cell's state # then, rewrite 3 quadv components corresponding to the Margolus neighbors CMP quadv.yz, quad.x, horiz.zzww, quadv.yyzz; # x > quadv.yz = horiz.zw CMP quadv.x, quad.y, horiz.yyyy, quadv.xxxx; # y > quadv.x = horiz.y CMP quadv.xw, quad.z, horiz.xxyy, quadv.xxww; # z > quadv.xw = horiz.xy CMP quadv.z, quad.w, horiz.zzzz, quadv.zzzz; # w > quadv.z = horiz.z CMP quadv.w, quad.x, vert.zzzz, quadv.wwww; # x > quadv.w = vert.z CMP quadv.zw, quad.y, vert.zwzw, quadv.zwzw; # y > quadv.zw = vert.zw CMP quadv.y, quad.z, vert.yyyy, quadv.yyyy; # z > quadv.y = vert.y CMP quadv.xy, quad.w, vert.yxyx, quadv.xyxy; # w > quadv.xy = vert.yx
In the code fragment above, we're rewriting 3 and only 3 components of 'quadv'
, corresponding to three Margolus neighbors. For example, if current pixel happens to
be downleft pixel in the quad (meaning quad=(1, 0, 0, 0)
), then we rewrite
quadv.yz
(upper row) components with horiz.zw
, and
quadv.w
(downright cell) with vert.z
. Similarly, other
cases of current cell's position in the quad are handled.
It's possible to check whether 'quadv'
got filled correctly or not, using similar
"printf debugging" method we used before. If we do an identity
transformation, i.e:
Example 6. Identity drawing based on calculated 'quadv'
TEMP check; # checking that quadv is correct CMP check, quad.x, quadv.xxxx, check; # if current pixel is at X (downleft) position, # use quadv.x value CMP check, quad.y, quadv.yyyy, check; # Y position > quadv.y value CMP check, quad.z, quadv.zzzz, check; # etc. CMP check, quad.w, quadv.wwww, check; MOV result.color, check; END
Then we should see the initial state on every step of the simulation. If CA state changes, that would mean
'quadv'
values were calculated incorrectly.
At this point, we have enough background and utility code to implement an actual CA with Margolus neighborhood. We will start with socalled TM GAS automata (using terminology from this great book).
Consider the following simple Margolus neighborhood rule: rotate every quad
clockwise on even steps of the simulation, and counterclockwise on odd ones. Here is what happens
when every cell can be in one of two states (1
 gas particle is present in the cell,
0
 no gas particle in the cell):
So, particles move horizontally (and vertically), not noticing each other. This is a very simple model of a gas,
since there are only four moving directions, constant speed, and no collisions. We can introduce collisions by
adding one more rule: do not rotate a quad, if there are exactly two particles and
they're moving in the opposite directions. Note that such rule doesn't break the momentum
conversation law:
In the picture above, two particles arrive to the same quad after counterclockwise rotation at t=1
. Since there are exactly two particles in the quad, and they would have normally moved in the
opposite direction, rule 2 of TM GAS denies clockwise rotation of the quad
at t=0
. Then simulation proceed as usual at t=1
, and
particles do move (now vertically), since they are not in the same quad
anymore.
Another good enhancement of the TM GAS is introduction of walls. Let's say there is a 3^{rd} possible state of a cell  the wall state. Let's not rotate quads having at least one wall in them at all. It's easy to see that this rule leads to particles reflecting from the walls (I will left this fact for the reader to prove).
Since all rules of TM GAS results in only rotations of quads, there is no way that a particle will penetrate the wall, if there was no such particle (inside the wall) in the initial state. Also, this means that the number of particles is constant (meaning total energy of the system is constant too).
First, let's define how do we encode the three possible states of a cell:

Empty space is encoded as
(0, 0, 0, 0)
. 
A particle is encoded as
(1, 0, *, *)
, whereZ
andW
components are actually insignificant. 
A wall is encoded as
(*, 1, *, *)
, where onlyG
component is significant.
Second, since we have walls now, we'll need to extend 'quadv'
calculation a little bit,
and define also 'quadg'
variable, holding wall information for the current quad, if
any:
Example 7. Respecting walls while calculating 'horiz' temporary variable
# 'r' component is significant for particles # 'g' component  for walls TEMP horiz_g; #  z w TEMP horiz; #  ADD cur_pos, fragment.texcoord, shift.zzww; # 1, 1 #   TEX cur_clr, cur_pos, texture, 2D; #  MOV horiz.x, cur_clr.r; # x y MOV horiz_g.x, cur_clr.g; ADD cur_pos, fragment.texcoord, shift.wzww; # 0, 1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.y, cur_clr.r; MOV horiz_g.y, cur_clr.g; ADD cur_pos, fragment.texcoord, shift.wxww; # 0, +1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.z, cur_clr.r; MOV horiz_g.z, cur_clr.g; ADD cur_pos, fragment.texcoord, shift.xxww; # +1, +1 TEX cur_clr, cur_pos, texture, 2D; MOV horiz.w, cur_clr.r; MOV horiz_g.w, cur_clr.g; # vert_g calculation skipped, it's done in a similar way
Final 'quadg'
variable, holding status of walls for the current quad, is calculated
just like familiar 'quadv'
:
Example 8. Finalizing the walls' status for the quad
TEMP quadg; TEX cur_clr, fragment.texcoord, texture, 2D; MOV quadg.xyzw, cur_clr.gggg; CMP quadg.yz, quad.x, horiz_g.zzww, quadg.yyzz; # x > quadg.yz = horiz_g.zw CMP quadg.x, quad.y, horiz_g.yyyy, quadg.xxxx; # y > quadg.x = horiz_g.y CMP quadg.xw, quad.z, horiz_g.xxyy, quadg.xxww; # z > quadg.xw = horiz_g.xy CMP quadg.z, quad.w, horiz_g.zzzz, quadg.zzzz; # w > quadg.z = horiz_g.z CMP quadg.w, quad.x, vert_g.zzzz, quadg.wwww; # x > quadg.w = vert_g.z CMP quadg.zw, quad.y, vert_g.zwzw, quadg.zwzw; # y > quadg.zw = vert_g.zw CMP quadg.y, quad.z, vert_g.yyyy, quadg.yyyy; # z > quadg.y = vert_g.y CMP quadg.xy, quad.w, vert_g.yxyx, quadg.xyxy; # w > quadg.xy = vert_g.yx
Finally, we have everything at hand to implement the rules of TM GAS themselves:
Example 9. Main logic of TM GAS automata
# normalize quadv and quadg hold exactly 0 or 1 SGE quadv, quadv, 0.5; SGE quadg, quadg, 0.5; # is_colliding flag means 2 particles collide  no rotation TEMP is_colliding; MUL is_colliding, quadv.xyzw, quadv.zwzw; # x = (x & z), y = (y & w) ADD is_colliding, is_colliding.xxxx, is_colliding.yyyy; # (x & z)  (y & w) # also no rotation if a wall detected inside the quad ADD is_colliding, is_colliding, quadg; # collide  wall in any of 4 components DP4 is_colliding, is_colliding, 1; # propagate result to all components # resulting color holder TEMP ret_clr; # even step  rotate clockwise, else counterclockwise CMP ret_clr.xyzw, program.env[0], quadv.yzwx, quadv.wxyz; # propagate to whole ret_clr  meaning every particle will have (R, 0, R, R) state CMP ret_clr.rba, quad.x, ret_clr.xxxx, ret_clr; CMP ret_clr.rba, quad.y, ret_clr.yyyy, ret_clr; CMP ret_clr.rba, quad.z, ret_clr.zzzz, ret_clr; CMP ret_clr.rba, quad.w, ret_clr.wwww, ret_clr; MOV ret_clr.g, 0; # make sure there is no wall (can't be since we rotated) CMP ret_clr, is_colliding, cur_clr, ret_clr; # no change if colliding or a wall MOV result.color, ret_clr; END
Note
Fun fact: TM GAS is completely reversible. No matter how far the simulation had gone, you can always revert the rules and eventually get the initial state.
"Critters" is another very interesting cellular automata, which evolutions resemble a colony of creatures, able to commute vertically and horizontally, and create smaller subcolonies. Interesting feature of this automata is that after some time, the number of creatures in the simulation seems to remain constant (though it's not obvious from the CA's rules). I also managed to create configurations similar in their behavior to oscillating strings, where energy is transported between modes of oscillation, but this is a topic for a later post.
"Critters" rules are a little bit more complex than those of TM GAS:

Every cell can be in one of the two states 
'1'
or'0'
. We will, of course, encore them as1.0
and0.0
in the ARB code. 
If a quad contains exactly two cells in
'1'
state, the quad is not modified. 
Else, every cell in the quad changes state to the complementary one, i.e.
'0'<>'1'
. 
In addition to the previous complementing rule, if the original quad had exactly three cells in
'1'
state, the quad rotates 180 degrees.
As you can see, the rules in their pure form would not be very pleasing to watch: empty quads are changed to filled quads and vice versa on every simulation step (read: lots of flickering). But, we can employ a color switching trick, similar to what we did with BelousovZhabotinsky for flickering reduction. I will comment on the trickrelated lines in the ARB code below.
Example 10. "Critters" CA main logic ARB implementation
# calculation of 'quad' and 'quadv' is skipped... # normalize quadv to hold 0 and 1 exactly SGE quadv, quadv, 0.5; # color switching trick (part 1) here  # invert quadv on odd steps, since 0s are actually should be treated as 1s then and vice versa TEMP quadv_inv; SUB quadv_inv, 1, quadv; CMP quadv, program.env[0], quadv_inv, quadv; TEMP num_neigh; DP4 num_neigh, quadv, 1; # num_neigh holds number of 1s in the quad # now to the actual CA's rules TEMP num_ge; TEMP num_le; TEMP is_num; # used as a tempropary flag # look if num_neigh == 3 SGE num_ge, num_neigh, 3; SGE num_le, num_neigh, 3; MUL is_num, num_le, num_ge; # is_num = (num_neigh == 3) # for num_neigh == 3, we should also rotate 180: # so, if num_neigh == 3, get opposite cell, if not, get current CMP cur_clr.xyzw, is_num, quadv.zwxy, quadv.xyzw; # propagate to whole cur_clr CMP cur_clr, quad.x, cur_clr.xxxx, cur_clr; CMP cur_clr, quad.y, cur_clr.yyyy, cur_clr; CMP cur_clr, quad.z, cur_clr.zzzz, cur_clr; CMP cur_clr, quad.w, cur_clr.wwww, cur_clr; # getting the complement: invert color, store in inv_color TEMP inv_clr; SUB inv_clr, 1, cur_clr; # look if num_neigh == 2 (we don't need to invert then) SGE num_ge, num_neigh, 2; SGE num_le, num_neigh, 2; MUL is_num, num_le, num_ge; # is_num = (num_neigh == 2) TEMP cur_clr_even; TEMP cur_clr_odd; # if num_neigh == 2, take the original color, else inverted one # also, employ the color switching trick (part 2): invert once more for odd steps to avoid flickering CMP cur_clr_even, is_num, inv_clr, cur_clr; CMP cur_clr_odd, is_num, cur_clr, inv_clr; CMP cur_clr, program.env[0], cur_clr_odd, cur_clr_even; MOV result.color, cur_clr; END
Ok, in this installment we have discovered how to:

Tackle generic problems of implementing any cellular automata with Margolus neighborhood on a GPU, specifically, how to transform 2x2 quad MN rules to ARB program: finding pixel position in the quad and finding exact states of four pixels of the quad.

Implement simple CA emulating gas with collisions, holding laws of momentum and energy conservation. Despite particles in the gas are able to move in only four directions and with a constant speed, sooner or later it uniformly fills all available area. Also, since it's reversible, you can actually get all the particles to their initial state by reversing the rules.

Implement another CA named "Critters", which features a rich set of configurations when simulated.

Conway's "Life" and "Brian's Brain" cellular automata using GPU
Introduction to a general structure of OpenGL "wrapper" program for implementation of CAs on a GPU. A couple of simple Moore neighborhood CAs implemented.

BelousovZhabotinsky reaction with hexagonal cellular automaton on a GPU
Another CA on GPU implementation, this time on a hexagonal grid.

A small page giving a definition of the MN.

Cellular Automata rules lexicon on MN
A little bit more extensive (than previous link) explanation, plus some example rule sets.

Toffoli, Margolus "Cellular Automata Machines: A New Environment for Modeling"
Excellent book on cellular automata, with physics references and examples in Fort language.
No comments:
Post a Comment