ಠ_ಠ: Margolus neighborhood in cellular automata: "TM GAS" and "Critters" on a GPU

Table of Contents

1. Introduction

2. Margolus neighborhood

3. General setup of a shader program

3.1. Determining current pixel's position in a quad
3.2. Checking correctness of the pixel position calculation
3.3. Finding out states of Margolus neighbors
3.4. Checking correctness of current quad state calculation

4. "TM GAS" - a gas with collisions

4.1. ARB implementation of TM GAS

5. "Critters" CA

6. Conclusion

7. Further reading

7.1. In this blog
7.2. Margolus neighborhood
7.3. Proprietary sources

1. Introduction

This is a continuation of the "CA on a GPU" theme. This time, I'm going to describe Margolus variation of CA neighborhoods, quite a different one from the Moore neighborhood. I will start by introducing the Margolus neighborhood (MN), then implement a generic part of an ARB program for MN CA emulation and talk about a couple of debugging techniques.

Last, two cellular automata will be considered, a simple colliding gas simulation (TM GAS), and "Critters" CA, having (probably) rather obscure physical meaning, discussion of which we are going to postpone for now.

As I will skip the description of general structure of OpenGL "wrapper" program, it might be useful to read the first post beforehand for the sake of comprehension.

2. Margolus neighborhood

Contrary to the Moore neighborhood, where rules specify what transformations each cell undergoes depending on its own state and the states of its neighbors, the Margolus neighborhood operates on 4-cells blocks, which I'll denote as quads here. That is, for every possible state of 2x2 block (a quad) of cells, there is a corresponding state, which the quad should take on the next time step. Plus, on every other step, lattice partitioning into quads is shifted to 1 cell both horizontally and vertically, lest all the quads would evolute independently of each other.

Figure 2. Alternating lattice partitioning with Margolus neighborhood

In the picture above, the same initial state of CA is shown, where only two cells are in "active" state. If we choose the partitioning scheme on the left, we would have to apply 3 rules to calculate next CA state: a rule for an empty quad (a lot of times), and 2 rules for a quad's state with only 1 active cell (one in up-right corner, another in up-left corner). If, by contrast, we choose the partitioning on the right, then both active cells are in the same quad, and we would apply only 2 rules: again, the one for the empty quad, and another one for the quad with 2 active cells on the lower row.

Irrespectively of initial partitioning chosen, CA simulation requires that we alternate between them on every step when simulating.

3. General setup of a shader program

Since any shader program operates on pixels, not quads, we need to be able to know where in the quad the current pixel is situated.

3.1. Determining current pixel's position in a quad

Let's try to set up an ARB temp variable 'quad', in such a way that it has only one of its four components set to 1, denoting the position of the current pixel in the quad. Let's define the following correspondence between XYZW components of 4-vector and quad's cells (blue corners shows alternating odd-steps quad partitioning):

Figure 3. Four possible positions of a cell in a quad

We will start with determining if X coordinate of the pixel is odd or even, do the same for Y coordinate, and store the result in some ARB temp variables:

Example 1. Odd/even testing for X and Y axes

TEMP shift;                                # remember, we pass 1/TEX_WIDTH as a parameter to the program
MOV  shift, program.env[1];                # x - positive,  z - negative

# get fragment position in the quad
TEMP mod_pos;
RCP  mod_pos, shift.x;                     # mod_pos = (1/shift.x, 1/shift.x, ...)
MUL  mod_pos, fragment.texcoord, mod_pos;  # mod_pos = (x/shift.x, y/shift.x, ...)
                                           # since fragment.texcoord is (step/2+step*N), where N is natural number,
SUB  mod_pos, mod_pos, 0.5;                # we need to subtract 0.5 for the first term
MUL  mod_pos, mod_pos, 0.5;                # and finally divide by 2 to get either natural M or M+0.5
FRC  mod_pos, mod_pos;                     # get fraction - mod_pos.xy are now either ~0 or ~0.5
ADD  mod_pos, mod_pos, 0.25;               # mod_pos.xy are either ~0.25 or ~0.75

TEMP quad_x;                               #   y| z     0| 1     1| 0
SGE  quad_x.zw, mod_pos.xxxx, 0.5;         #  --|--    --|--    --|--
SLT  quad_x.xy, mod_pos.xxxx, 0.5;         #   x| w     0| 1     1| 0
                                           #           x>0.5     x<0.5

TEMP quad_y;                               #   y| z     1| 1     0| 0
SGE  quad_y.yz, mod_pos.yyyy, 0.5;         #  --|--    --|--    --|--
SLT  quad_y.xw, mod_pos.yyyy, 0.5;         #   x| w     0| 0     1| 1
                                           #           y>0.5     y<0.5

At this point, quad_x variable has its left "column" set to 1s when X coordinate of the pixel is even, and right column, when it's odd. Same for quad_y, but "rows" are being set up instead of "columns". All we have left to do is to multiply them and adjust for odd steps, where we have to shift the partitioning:

Example 2. Finalizing cell's position in the quad

TEMP quad;
MUL  quad, quad_x, quad_y;                 # quad contains one and only one component set to 1

# shift the quad for odd steps
CMP  quad.xyzw, -program.env[0], quad.zwxy, quad.xyzw;

This CMP operation above kicks in only when program.env[0]>0, and we surely set it to be 1 only for odd steps of the simulation (example on Java/jogl):

    int _flip =  1;
    float _pr_param[][] = new float[][] { { 0, 0, 0, 0 }, { 1, 1, 1, 1 } };

    gl.glProgramEnvParameter4fvARB(GL.GL_FRAGMENT_PROGRAM_ARB,
                                   0, _pr_param[_flip = 1 - _flip],
                                   0);

For odd steps, we are swapping X<->Z, Y<->W components of the quad. Indeed, it's clear from the picture, that, for example, Y (or up-left) position in the quad is, in fact, W (or down-right) position in the shifted quad.

3.2. Checking correctness of the pixel position calculation

One might wonder how can we check that we got everything right in the ARB code from the previous section. After all, there is quite a lot of ARB code, doing lots of shady operations with fractional values, trying to get exact 0s and 1s in the end...

The answer is the following approach, roughly equivalent to the so called "printf debugging" in other languages:

Example 3. Testing pixel quad positioning calculation

!!ARBfp1.0

# quad variable calculation goes here, skipped...

TEMP check;                                # checking that quad is correct
MOV  check, 0;
CMP  check.r, -quad.x, 1, check;
CMP  check.g, -quad.y, 1, check;
CMP  check.b, -quad.z, 1, check;
CMP  check.a, -quad.w, 1, check;           # check is red/gree/blue/black for DL/UL/UR/DR pixel position correspondingly

MOV result.color, check;
END

This code will result in drawing of different patterns for odd and even steps, both consisting of only red, green, blue, and black colors. If there is a bug in 'quad' variable calculation (like, several components are set to 1s instead of only one), the rendered picture could contain more colors than these four.

3.3. Finding out states of Margolus neighbors

Now, to implement the logic of any Margolus neighborhood CA, we need to know the states of the current cell (obtained trivially via TEX instruction), and states of other three neighbors in the current quad. Let's say we have R component encoding the state of the cell, and try to define a 4-vector variable 'quadv' (short for "quad values") holding the significant R component of all quad member cells as its XYZW components:

Figure 4. A couple of examples of 'quadv' variable definition (C means current pixel)

First, let's calculate two temporary variables, 'vert' and 'horiz', holding significant components of all 8 Moore neighbors as theirs XYZW components. I will divide 8 neighbors to two 4-vectors based on the following scheme:

Figure 5. Storing Moore neighbors temporarily

Here is the ARB code to get Moore neighbors and store them in two 4-vectors:

Example 4. Calculation of 'horiz' and 'vert' temporary variables

TEMP cur_pos;                                  # remember, shift.x is positive shift, shift.z - negative
TEMP cur_clr;

# let R component be significant
                                                            #    | z| w
TEMP horiz;                                                 #  --|--|--
ADD  cur_pos, fragment.texcoord, shift.zzww;   # -1, -1     #    | c|
TEX  cur_clr, cur_pos, texture, 2D;                         #  --|--|--
  MOV  horiz.x, cur_clr.r;                                  #   x| y|
ADD  cur_pos, fragment.texcoord, shift.wzww;   #  0, -1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  horiz.y, cur_clr.r;
ADD  cur_pos, fragment.texcoord, shift.wxww;   #  0, +1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  horiz.z, cur_clr.r;
ADD  cur_pos, fragment.texcoord, shift.xxww;   # +1, +1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  horiz.w, cur_clr.r;
                                                            #   x|  |
TEMP vert;                                                  #  --|--|--
ADD  cur_pos, fragment.texcoord, shift.zxww;   # -1, +1     #   y| c| z
TEX  cur_clr, cur_pos, texture, 2D;                         #  --|--|--
  MOV  vert.x, cur_clr.r;                                   #    |  | w
ADD  cur_pos, fragment.texcoord, shift.zwww;   # -1,  0
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  vert.y, cur_clr.r;
ADD  cur_pos, fragment.texcoord, shift.xwww;   # +1,  0
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  vert.z, cur_clr.r;
ADD  cur_pos, fragment.texcoord, shift.xzww;   # +1, -1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV  vert.w, cur_clr.r;

Second, as soon as we have 'horiz' and 'vert' calculated, it should be easy to carefully select correct components (based on known position of the current cell in the quad) and store them as 'quadv':

Example 5. Finalizing 'quadv' calculation

# quadv variable holds all 4 values of the current quad
TEMP quadv;
TEX  cur_clr, fragment.texcoord, texture, 2D;
MOV  quadv.xyzw, cur_clr.rrrr;                       # first, fill all quadv components with the current cell's state

# then, re-write 3 quadv components corresponding to the Margolus neighbors
CMP  quadv.yz, -quad.x, horiz.zzww, quadv.yyzz;      # x -> quadv.yz = horiz.zw
CMP  quadv.x,  -quad.y, horiz.yyyy, quadv.xxxx;      # y -> quadv.x  = horiz.y
CMP  quadv.xw, -quad.z, horiz.xxyy, quadv.xxww;      # z -> quadv.xw = horiz.xy
CMP  quadv.z,  -quad.w, horiz.zzzz, quadv.zzzz;      # w -> quadv.z  = horiz.z

CMP  quadv.w,  -quad.x, vert.zzzz, quadv.wwww;       # x -> quadv.w  = vert.z
CMP  quadv.zw, -quad.y, vert.zwzw, quadv.zwzw;       # y -> quadv.zw = vert.zw
CMP  quadv.y,  -quad.z, vert.yyyy, quadv.yyyy;       # z -> quadv.y  = vert.y
CMP  quadv.xy, -quad.w, vert.yxyx, quadv.xyxy;       # w -> quadv.xy = vert.yx

In the code fragment above, we're rewriting 3 and only 3 components of 'quadv', corresponding to three Margolus neighbors. For example, if current pixel happens to be down-left pixel in the quad (meaning quad=(1, 0, 0, 0)), then we re-write quadv.yz (upper row) components with horiz.zw, and quadv.w (down-right cell) with vert.z. Similarly, other cases of current cell's position in the quad are handled.

3.4. Checking correctness of current quad state calculation

It's possible to check whether 'quadv' got filled correctly or not, using similar "printf debugging" method we used before. If we do an identity transformation, i.e:

Example 6. Identity drawing based on calculated 'quadv'

TEMP check;                                          # checking that quadv is correct
CMP  check, -quad.x, quadv.xxxx, check;              # if current pixel is at X (down-left) position,
                                                     # use quadv.x value
CMP  check, -quad.y, quadv.yyyy, check;              # Y position -> quadv.y value
CMP  check, -quad.z, quadv.zzzz, check;              # etc.
CMP  check, -quad.w, quadv.wwww, check;

MOV  result.color, check;
END

Then we should see the initial state on every step of the simulation. If CA state changes, that would mean 'quadv' values were calculated incorrectly.

4. "TM GAS" - a gas with collisions

At this point, we have enough background and utility code to implement an actual CA with Margolus neighborhood. We will start with so-called TM GAS automata (using terminology from this great book).

Consider the following simple Margolus neighborhood rule: rotate every quad clockwise on even steps of the simulation, and counter-clockwise on odd ones. Here is what happens when every cell can be in one of two states (1 - gas particle is present in the cell, 0 - no gas particle in the cell):

Figure 6. Gas without collisions

So, particles move horizontally (and vertically), not noticing each other. This is a very simple model of a gas, since there are only four moving directions, constant speed, and no collisions. We can introduce collisions by adding one more rule: do not rotate a quad, if there are exactly two particles and they're moving in the opposite directions. Note that such rule doesn't break the momentum conversation law:

Figure 7. TM GAS collision simulation

In the picture above, two particles arrive to the same quad after counter-clockwise rotation at t=-1. Since there are exactly two particles in the quad, and they would have normally moved in the opposite direction, rule 2 of TM GAS denies clockwise rotation of the quad at t=0. Then simulation proceed as usual at t=1, and particles do move (now vertically), since they are not in the same quad anymore.

Another good enhancement of the TM GAS is introduction of walls. Let's say there is a 3^rd possible state of a cell - the wall state. Let's not rotate quads having at least one wall in them at all. It's easy to see that this rule leads to particles reflecting from the walls (I will left this fact for the reader to prove).

Since all rules of TM GAS results in only rotations of quads, there is no way that a particle will penetrate the wall, if there was no such particle (inside the wall) in the initial state. Also, this means that the number of particles is constant (meaning total energy of the system is constant too).

4.1. ARB implementation of TM GAS

First, let's define how do we encode the three possible states of a cell:

Empty space is encoded as (0, 0, 0, 0).
A particle is encoded as (1, 0, *, *) , where Z and W components are actually insignificant.
A wall is encoded as (*, 1, *, *), where only G component is significant.

Second, since we have walls now, we'll need to extend 'quadv' calculation a little bit, and define also 'quadg' variable, holding wall information for the current quad, if any:

Example 7. Respecting walls while calculating 'horiz' temporary variable

# 'r' component is significant for particles
# 'g' component - for walls

TEMP horiz_g;                                                 #    | z| w
TEMP horiz;                                                   #  --|--|--
ADD  cur_pos, fragment.texcoord, shift.zzww;   # -1, -1       #    |  |
TEX  cur_clr, cur_pos, texture, 2D;                           #  --|--|--
  MOV    horiz.x, cur_clr.r;                                  #   x| y|
  MOV  horiz_g.x, cur_clr.g;
ADD  cur_pos, fragment.texcoord, shift.wzww;   #  0, -1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV    horiz.y, cur_clr.r;
  MOV  horiz_g.y, cur_clr.g;
ADD  cur_pos, fragment.texcoord, shift.wxww;   #  0, +1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV    horiz.z, cur_clr.r;
  MOV  horiz_g.z, cur_clr.g;
ADD  cur_pos, fragment.texcoord, shift.xxww;   # +1, +1
TEX  cur_clr, cur_pos, texture, 2D;
  MOV    horiz.w, cur_clr.r;
  MOV  horiz_g.w, cur_clr.g;

# vert_g calculation skipped, it's done in a similar way

Final 'quadg' variable, holding status of walls for the current quad, is calculated just like familiar 'quadv':

Example 8. Finalizing the walls' status for the quad

TEMP quadg;
TEX  cur_clr, fragment.texcoord, texture, 2D;
MOV  quadg.xyzw, cur_clr.gggg;

CMP  quadg.yz, -quad.x, horiz_g.zzww, quadg.yyzz;             # x -> quadg.yz = horiz_g.zw
CMP  quadg.x,  -quad.y, horiz_g.yyyy, quadg.xxxx;             # y -> quadg.x  = horiz_g.y
CMP  quadg.xw, -quad.z, horiz_g.xxyy, quadg.xxww;             # z -> quadg.xw = horiz_g.xy
CMP  quadg.z,  -quad.w, horiz_g.zzzz, quadg.zzzz;             # w -> quadg.z  = horiz_g.z

CMP  quadg.w,  -quad.x, vert_g.zzzz, quadg.wwww;              # x -> quadg.w  = vert_g.z
CMP  quadg.zw, -quad.y, vert_g.zwzw, quadg.zwzw;              # y -> quadg.zw = vert_g.zw
CMP  quadg.y,  -quad.z, vert_g.yyyy, quadg.yyyy;              # z -> quadg.y  = vert_g.y
CMP  quadg.xy, -quad.w, vert_g.yxyx, quadg.xyxy;              # w -> quadg.xy = vert_g.yx

Finally, we have everything at hand to implement the rules of TM GAS themselves:

Example 9. Main logic of TM GAS automata

# normalize quadv and quadg hold exactly 0 or 1
SGE  quadv, quadv, 0.5;
SGE  quadg, quadg, 0.5;

# is_colliding flag means 2 particles collide - no rotation
TEMP is_colliding;
MUL  is_colliding, quadv.xyzw, quadv.zwzw;                    # x = (x & z), y = (y & w)
ADD  is_colliding, is_colliding.xxxx, is_colliding.yyyy;      # (x & z) || (y & w)

# also no rotation if a wall detected inside the quad
ADD  is_colliding, is_colliding, quadg;                       # collide || wall in any of 4 components
DP4  is_colliding, is_colliding, 1;                           # propagate result to all components

# resulting color holder
TEMP ret_clr;

# even step - rotate clockwise, else counter-clockwise
CMP ret_clr.xyzw, -program.env[0], quadv.yzwx, quadv.wxyz;

# propagate to whole ret_clr - meaning every particle will have (R, 0, R, R) state
CMP  ret_clr.rba, -quad.x, ret_clr.xxxx, ret_clr;
CMP  ret_clr.rba, -quad.y, ret_clr.yyyy, ret_clr;
CMP  ret_clr.rba, -quad.z, ret_clr.zzzz, ret_clr;
CMP  ret_clr.rba, -quad.w, ret_clr.wwww, ret_clr;
MOV  ret_clr.g, 0;                                            # make sure there is no wall (can't be since we rotated)

CMP ret_clr, -is_colliding, cur_clr, ret_clr;                 # no change if colliding or a wall

MOV result.color, ret_clr;
END

Note

Fun fact: TM GAS is completely reversible. No matter how far the simulation had gone, you can always revert the rules and eventually get the initial state.

5. "Critters" CA

"Critters" is another very interesting cellular automata, which evolutions resemble a colony of creatures, able to commute vertically and horizontally, and create smaller sub-colonies. Interesting feature of this automata is that after some time, the number of creatures in the simulation seems to remain constant (though it's not obvious from the CA's rules). I also managed to create configurations similar in their behavior to oscillating strings, where energy is transported between modes of oscillation, but this is a topic for a later post.

"Critters" rules are a little bit more complex than those of TM GAS:

Every cell can be in one of the two states - '1' or '0'. We will, of course, encore them as 1.0 and 0.0 in the ARB code.
If a quad contains exactly two cells in '1' state, the quad is not modified.
Else, every cell in the quad changes state to the complementary one, i.e. '0'<->'1'.
In addition to the previous complementing rule, if the original quad had exactly three cells in '1' state, the quad rotates 180 degrees.

As you can see, the rules in their pure form would not be very pleasing to watch: empty quads are changed to filled quads and vice versa on every simulation step (read: lots of flickering). But, we can employ a color switching trick, similar to what we did with Belousov-Zhabotinsky for flickering reduction. I will comment on the trick-related lines in the ARB code below.

Example 10. "Critters" CA main logic ARB implementation

# calculation of 'quad' and 'quadv' is skipped...

# normalize quadv to hold 0 and 1 exactly
SGE  quadv, quadv, 0.5;

# color switching trick (part 1) here - 
# invert quadv on odd steps, since 0s are actually should be treated as 1s then and vice versa
TEMP quadv_inv;
SUB  quadv_inv, 1, quadv;
CMP  quadv, -program.env[0], quadv_inv, quadv;

TEMP num_neigh;
DP4  num_neigh, quadv, 1;                      # num_neigh holds number of 1s in the quad

# now to the actual CA's rules
TEMP num_ge;
TEMP num_le;
TEMP is_num;                                   # used as a tempropary flag

# look if num_neigh == 3
SGE  num_ge, num_neigh,   3;
SGE  num_le, -num_neigh, -3;
MUL  is_num, num_le, num_ge;                   # is_num = (num_neigh == 3)

# for num_neigh == 3, we should also rotate 180:
# so, if num_neigh == 3, get opposite cell, if not, get current
CMP cur_clr.xyzw, -is_num, quadv.zwxy, quadv.xyzw;

# propagate to whole cur_clr
CMP  cur_clr, -quad.x, cur_clr.xxxx, cur_clr;
CMP  cur_clr, -quad.y, cur_clr.yyyy, cur_clr;
CMP  cur_clr, -quad.z, cur_clr.zzzz, cur_clr;
CMP  cur_clr, -quad.w, cur_clr.wwww, cur_clr;

# getting the complement: invert color, store in inv_color
TEMP inv_clr;
SUB  inv_clr, 1, cur_clr;

# look if num_neigh == 2 (we don't need to invert then)
SGE  num_ge,  num_neigh,  2;
SGE  num_le, -num_neigh, -2;
MUL  is_num, num_le, num_ge;                   # is_num = (num_neigh == 2)

TEMP cur_clr_even;
TEMP cur_clr_odd;

# if num_neigh == 2, take the original color, else inverted one
# also, employ the color switching trick (part 2): invert once more for odd steps to avoid flickering
CMP cur_clr_even, -is_num, inv_clr, cur_clr;
CMP cur_clr_odd,  -is_num, cur_clr, inv_clr;
CMP cur_clr,      -program.env[0], cur_clr_odd, cur_clr_even;

MOV result.color, cur_clr;
END

6. Conclusion

Ok, in this installment we have discovered how to:

Tackle generic problems of implementing any cellular automata with Margolus neighborhood on a GPU, specifically, how to transform 2x2 quad MN rules to ARB program: finding pixel position in the quad and finding exact states of four pixels of the quad.
Implement simple CA emulating gas with collisions, holding laws of momentum and energy conservation. Despite particles in the gas are able to move in only four directions and with a constant speed, sooner or later it uniformly fills all available area. Also, since it's reversible, you can actually get all the particles to their initial state by reversing the rules.

Figure 8. TM GAS colliding with the wall
Implement another CA named "Critters", which features a rich set of configurations when simulated.

Figure 9. "Critters" expanding the colony

7. Further reading

7.1. In this blog

Conway's "Life" and "Brian's Brain" cellular automata using GPU

Introduction to a general structure of OpenGL "wrapper" program for implementation of CAs on a GPU. A couple of simple Moore neighborhood CAs implemented.
Belousov-Zhabotinsky reaction with hexagonal cellular automaton on a GPU

Another CA on GPU implementation, this time on a hexagonal grid.

7.2. Margolus neighborhood

The Margolus neighbourhood

A small page giving a definition of the MN.
Cellular Automata rules lexicon on MN

A little bit more extensive (than previous link) explanation, plus some example rule sets.

7.3. Proprietary sources

Toffoli, Margolus "Cellular Automata Machines: A New Environment for Modeling"

Excellent book on cellular automata, with physics references and examples in Fort language.

ಠ_ಠ

Mar 28, 2010

Margolus neighborhood in cellular automata: "TM GAS" and "Critters" on a GPU