Quicklife Algorithm

For general discussion about Conway's Game of Life.
Post Reply
moony
Posts: 8
Joined: October 22nd, 2017, 10:55 pm

Quicklife Algorithm

Post by moony » January 23rd, 2018, 11:16 am

Could someone give/find me an explanation for Quicklife or a similar nonhashing algorithm?
I'm developing a simulation program for Life, Life-like, and Non-totalistic rules and am planning on using the quicklife algorithm. (High memory overhead really isn't something I want, and a bounded grid is already in mind.)

EDIT: Forgot to note that i already have googled it.

rokicki
Posts: 80
Joined: August 6th, 2009, 12:26 pm

Re: Quicklife Algorithm

Post by rokicki » January 23rd, 2018, 12:53 pm

A minutely detailed description is in qlifealgo.h and qlifealgo.cpp in the Golly sources.

Quicklife uses a tree structure for the universe. It's not a quadtree; instead, each level of the tree increases one of the dimensions by a factor of 8. But by using a tree, there's no coordinates and thus no bound on the size of the universe.

There are distinct structures for 4x8, 32x8, and 32x32 sections of the universe. From there the sizes at each level are 256x32, 256x256, 2048x256, 2048x2048, and so on. The tree contains both an even and odd generation, along with flags indicating if that subtree is stable, oscillating with period 2, or active.

The calculation subroutines get passed four adjacent subtrees (so there's no need to "search" for adjacent sections of the universe, nor for a parent pointer).

Quicklife is probably way overengineered, and it was designed and built in a much different world than today (slower CPUs, slower clockspeeds relative to memory, 32-bit rather than 64-bit CPUs, etc.) There's probably at least a factor of two speedup possible if we reengineer it for today's CPUs.

But if you're starting from scratch, the main thing is to keep it as simple as possible, and only add what you need when you need it. Likely a much simpler algorithm will do what you need. I'm sure there are lots of people here who would be delighted to discuss possible algorithms with you.

User avatar
dvgrn
Moderator
Posts: 10612
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Quicklife Algorithm

Post by dvgrn » January 23rd, 2018, 1:07 pm

moony wrote:Could someone give/find me an explanation for Quicklife or a similar nonhashing algorithm?
I'm developing a simulation program for Life, Life-like, and Non-totalistic rules and am planning on using the quicklife algorithm. (High memory overhead really isn't something I want, and a bounded grid is already in mind.)

EDIT: Forgot to note that i already have googled it.
Heh, I just tried that, and didn't find much either.

The fastest non-hashing algorithms all use various low-level bit manipulations -- shifting, masking, XOR, recombining -- to effectively find the future state of a whole row of cells, say 32 or 64 cells, in one calculation.

Depending partly on the language you're developing in, you may or may not want to tackle all of that. Even without bit-twiddling tricks, there are a number of easy improvements on the naive algorithm.

(Naive algorithm summary: loop through all cells in Universe 1 bounded grid, calculate next state and record in Universe 2; when loop is done, swap Universes 1 and 2.)

My suggestion would be to start by reading Alan Hensel's general summary for his fast Java applet -- there are a lot of good clues in there. QuickLife was originally partly inspired by that algorithm.

Then if you really want to know how QuickLife works, there really may not be any better place to start than the source code, as Tom has suggested. Don't worry about reading the actual code at first, though. Just go through and read all the comments, top to bottom -- they're unusually good as these things go.

If that seems too complicated, then consider this for a start (if it isn't what you're doing already). In your bounded grid, keep a simple hash table containing all your ON cells. At each tick, make a list of the neighbors of each of your ON cells. For each of those cells, check to see if *its* neighbors are in the hash table. If you count the right number of neighbors, add that cell to a new hash table. Then check each of your ON cells and add them to the same table, but only if they should remain ON. Clear the universe, draw the cells in the new hash table, and repeat.

This works surprisingly well for moderate-sized bounded grids in many rules, just because many rules settle into fairly sparse arrangements of scattered cells. For all its disadvantages, at least this algorithm never spends any time looking at big empty spaces.

If that seems too simple... an alternative you might want to investigate is Calcyman's lifelib, which is somewhat faster than QuickLife, I believe. However, it's probably faster partly because it uses even more tricks specific to modern CPUs, that I don't really understand except vaguely -- maybe loop unrolling, and maybe clever use of cache memory, and so on.

rokicki
Posts: 80
Joined: August 6th, 2009, 12:26 pm

Re: Quicklife Algorithm

Post by rokicki » January 23rd, 2018, 1:20 pm

Another nice, simple algorithm that not many people try is to represent the universe as a list of rows. Each row has a y coordinate and a list of x coordinates of on-cells. That's it.

To calculate, you scan three rows at a time, calculating a new row. The code for this is perhaps a bit clever but it's not too bad.

And just like the algorithm suggested by Dave above, it doesn't waste time "working" on vast regions of empty space. The amount of memory required is linear in the number of on cells. The size of the universe is bounded only by the type you use to represent a coordinate value.

If you use Calcyman's lifelib, remember that it's a header-only library. Here's sample code showing how you might use it.

Code: Select all

#include "pattern2.h"
#include <fstream>

long long memsize = 1000 ;
long long stepsize = 1024 ;
long long maxgen = -1 ;

void info(uint64_t g, apg::pattern &x) {
    std::cout << g << " " << x.popcount(1000000007) << std::endl ;
}
int main(int argc, char *argv[]) {
    while (argc > 1 && argv[1][0] == '-') {
        argc-- ;
        argv++ ;
        switch(argv[0][1]) {
case 'M':
           memsize = atoll(argv[1]) ;
           argc-- ;
           argv++ ;
           break ;
case 'i':
           stepsize = atoll(argv[1]) ;
           argc-- ;
           argv++ ;
           break ;
case 'm':
           maxgen = atoll(argv[1]) ;
           argc-- ;
           argv++ ;
           break ;
        }
    }
    apg::lifetree<uint32_t, 1> lt(memsize) ;
    apg::pattern x(&lt, argv[1]) ;
    info(0, x) ;
    for (uint64_t i = 0; maxgen < 0 || i < maxgen; i += stepsize) {
        x = x[stepsize];
        info(i+stepsize, x) ;
    }
    return 0;
}

User avatar
vyznev
Posts: 27
Joined: April 23rd, 2016, 4:08 am

Re: Quicklife Algorithm

Post by vyznev » January 23rd, 2018, 1:29 pm

I don't know what exactly QuickLife does, since I've never checked, but here's a description I wrote of a reasonably efficient CA simulation algorithm.

Basically, the general principles you should follow in implementing a CA simulator (roughly in order of descending importance) are:
  • Avoid redundant memory access. It's expensive. If you need to use the same value twice, load it once into a local variable (which the compiler can store in a register).
  • If you must access a memory location twice, try to at least make sure it stays in CPU cache. For large 2D grids, breaking the grid into multiple smaller tiles may help with this.
  • Try to keep your memory access sequential, to make optimal use of prefetching.
  • Keep your inner simulation loop as small as possible, to make sure it fits fully in cache.
  • Avoid branches where possible. If you must have branches, try to keep them as predictable as possible.
  • Use bitwise operations where possible. It's a lot faster to check if, say, 8 cells are all dead if their states are packed into a single byte than if you have to loop over an 8-element array. Note that most processors nowadays have efficient bit-counting instructions, although accessing them from e.g. C code may require using non-standard compiler features. Even portable bit-twiddling hacks can still be faster than naïve looping, though.
Most of the above applies regardless of what specific algorithm you decide to use, but after that you get a bunch of decisions to make:
  • Do you want to optimize your code for dense chaotic patterns, or for sparse and highly repetitive ones? Or aim for a compromise? What about patterns that are dense but mostly stable?
  • What kinds of grid boundary conditions (including "theoretically boundless") do you want to support?
  • What kinds of rules do you want to support? Just B3/S23? Any totalistic rule? Non-totalistic isotropic rules? Non-isotropic rules? And what about rules with B0?
  • Also, which neighborhoods do you want to support? Just the 8 cell Moore neighborhood, or also 4 and 6 cell neighborhoods? What about more than 8 cells? Or Margolus grids?
  • How many states? Just live and dead? Live, dead and dying (as in Generations)? Any number of states?
  • Is a single-threaded simulation good enough for you, or do you want to make use of multiple cores? Or even run it on a GPU? Or a cluster of multiple nodes?
  • Do you want your code to provide a graphical view of the lattice, or is it just for "headless" brute force simulation? If the former, optimizing your graphics code is at least as important as optimizing the simulation itself.
In particular, simple brute-force code that just iterates over a bit-packed array of cells row by row is likely to be optimal for highly chaotic rules and patterns where most cells change unpredictably all the time. At least as long as the grid is small enough that three rows of it fit in the CPU cache; past than point, breaking up the grid into separate columns or tiles may be useful. Packing the states of 32 or 64 adjacent cells (or possibly more, if your compiler supports wider integer types) into single words and using bitslicing / SIMD-within-a-register techniques to compute their evolution in parallel can also be effective, especially if you can hardcode the CA rule and use simple bitwise operations to compute it.

On the other hand, for sparse (or at least sparsely active) patterns, your biggest performance gains will come from not simulating stable cells at all, e.g. by maintaining a list of cells that changed during the previous step, and only iterating over them and their neighbors. But this is slower than linear brute force simulation on dense patterns, and can cause to a lot more cache misses and branch mispredictions.

A useful compromise can be to split the grid into tiles and use an activity list to decide which tiles need updating, while using a fast linear simulation algorithm within each tile. You could even have multiple algorithms for simulating a tile, and switch dynamically between them e.g. based on the number of cells changed during the last step, so that a tile containing only a single oscillator or a glider could be simulated using an active cell list, while another tile containing a complex chaotic reaction could be handled by a linear row-by-row loop.

Of course, if you keep going down that route, at some point you'll find yourself implementing something like HashLife to optimize your block updates. Whether you really want to go that far is up to you, but I'd recommend starting with something simpler first. :)

User avatar
Macbi
Posts: 903
Joined: March 29th, 2009, 4:58 am

Re: Quicklife Algorithm

Post by Macbi » January 23rd, 2018, 1:34 pm

It's also possible that you could just use code directly from one of the sources people have mentioned.

moony
Posts: 8
Joined: October 22nd, 2017, 10:55 pm

Re: Quicklife Algorithm

Post by moony » January 23rd, 2018, 2:00 pm

I could, yes. I think i'll go with a semi naive approach in the form of storing each row.
Keeping the simulation loop tiny won't be very hard (I hope)
I plan on supporting isotropic non totalistic rules (Hershel Notation rules) WITHOUT B0
In terms of grid boundary, the boundary of the grid is 612*384 (Seems a little odd, but it has a purpose.)
For dense vs sparse, i'm after the middleground
Just an 8 cell neighbourhood.
Variable number of states will exist.
Singlethreaded is all I need for my purposes, but i'm probably going to code GOL to run off to the side of the main simulation, in a seperate thread that gets a lock for GOL's data as needed.
Graphical view is necessary, but i'm not going to touch the graphics code that already exists for what i'm doing, it's a huge mess. (I'm rewriting an existing simulation that's quite slow.)

Hardcoding the rule can be done, but i'm simulating multiple rules on the same grid (Insane, right? :lol: ) and new rules will be added in the future, so i want to avoid it.
Thanks for all the help guys.
The reason i'm trying to optimize this in the first place should become obvious after you try The Powder Toy out; it simulates tons of things already. If you look at the GOL simulation code, you'll find it's the most naive thing ever for simulating multiple rules.

Post Reply