## Golly could be faster?

For scripts to aid with computation or simulation in cellular automata.
mollwollfumble
Posts: 14
Joined: July 18th, 2016, 7:48 pm

### Golly could be faster?

Am new here. First time I ran Golly I thought "cripes that's fast", and it is, but not always. Golly (with default settings) handles "ash" extremely badly. I just wrote a 37 line standalone Fortran program, with no optimisation at all apart from the O3 compiler tag. On my old 32-bit computer it runs in 4 minutes 11 seconds. On the same problem, Golly is more than 5 times as slow!

The problem is 10,000 generations on a 5% random start on a 2048*2048 toroidal grid.

Ideally, a program designed to calculate Life ought to ignore ash completely, or at least ignore still lifes and period 2 oscillators blinker, toad and beacon. So it ought to speed up as the pattern develops towards ash. Is there a switch on Golly that does this?

Code: Select all

``````	parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1)
a=0
do j=1,numj
do i=1,numi
if(rand().lt.0.05) then
a(i,j)=1
end if
end do
end do
do itime=1,10000
do j=1,numj
do i=1,numi
icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
if(a(i,j).eq.0) then
if(icount.eq.3) then
a(i,j)=1
end if
else
if(icount.lt.2.or.icount.gt.3) then
a(i,j)=0
end if
end if
end do
end do
do i=1,numi
a(i,0)=a(i,numj)
a(i,numj+1)=a(i,1)
end do
do j=0,numj+1
a(0,j)=a(numi,j)
a(numi+1,j)=a(1,j)
end do
end do
stop
end
``````

Andrew
Moderator
Posts: 770
Joined: June 2nd, 2009, 2:08 am
Location: Melbourne, Australia
Contact:

### Re: Golly could be faster?

mollwollfumble wrote:... The problem is 10,000 generations on a 5% random start on a 2048*2048 toroidal grid.
Copied from Help > Bounded Grids:

Pattern generation in a bounded grid is slower than in an unbounded grid. This is because all the current algorithms have been designed to work with unbounded grids, so Golly has to do extra work to create the illusion of a bounded grid.

By "extra work" I mean that any live cells at the requested boundary have to be copied to the appropriate places outside the joined edges to calculate the next generation, then any live cells outside the boundary have to be deleted. The bigger the bounded grid, the more work. Also, the step increment has to be limited to 1, so we lose most of HashLife's advantages.
... So it ought to speed up as the pattern develops towards ash. Is there a switch on Golly that does this?
Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.

dvgrn
Moderator
Posts: 6708
Joined: May 17th, 2009, 11:00 pm
Contact:

### Re: Golly could be faster?

Andrew wrote:Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.
For the particular problem mentioned (2048x2048 toroidal grid) it would be easy -- or relatively easy at least -- to get an algorithm working that would simulate the bounded grid and still keep all the speed benefits of Hashlife. It should be possible to tweak Hashlife to set up a 2048x2048 tile as its own next-door neighbor in all eight directions.

An MxN bounded grid with M or N not being a power of two, seems like a somewhat taller order. There would still be some hope of arranging for the algorithm to ignore specific rows or columns in different-sized hashtiles, at some relatively minor cost in efficiency... but I'm not going to be tempted to try to code anything like that until my winning lottery checks start coming in and I have more time available to think. Which may be a while, unless someone else starts buying tickets for me.

mollwollfumble
Posts: 14
Joined: July 18th, 2016, 7:48 pm

### Re: Golly could be faster?

dvgrn wrote:
Andrew wrote:Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.
For the particular problem mentioned (2048x2048 toroidal grid) it would be easy -- or relatively easy at least -- to get an algorithm working that would simulate the bounded grid and still keep all the speed benefits of Hashlife. It should be possible to tweak Hashlife to set up a 2048x2048 tile as its own next-door neighbor in all eight directions.

An MxN bounded grid with M or N not being a power of two, seems like a somewhat taller order. There would still be some hope of arranging for the algorithm to ignore specific rows or columns in different-sized hashtiles, at some relatively minor cost in efficiency... but I'm not going to be tempted to try to code anything like that until my winning lottery checks start coming in and I have more time available to think. Which may be a while, unless someone else starts buying tickets for me.
Thanks all. BTW, I've lost all credibility by posting that screamingly wrong script above.

Weirdly, Golly is 6.75 times faster on Android than on Windows for this problem. Could Windows be the problem?

The corrected un-optimised (except for -O3 flag) standalone is as follows. For this specific problem on Cygwin it runs 11.75 times faster than Golly does on Windows. I'm now writing a faster standalone designed to ignore all still lifes and period 2 oscillators in the ash, but it isn't working yet.

Code: Select all

``````! separate odd and even generations
parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
a=0
! setup
do j=1,numj
do i=1,numi
if(rand().lt.0.05) then
a(i,j)=1
end if
end do
end do
call toroid(a)
! start
do itime=1,5000
call twostep(a,b)
call twostep(b,a)
end do
! output
open(1,file='two_end.txt',status='unknown')
do j=0,numj+1
write(1,'(2050i1)') (a(i,j),i=0,numi+1)
end do
close(1)
stop
end

subroutine twostep(a,b)  ! input a, output b
parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
! main routine
do j=1,numj
do i=1,numi
icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
if(a(i,j).eq.0) then
if(icount.eq.3) then
b(i,j)=1
else
b(i,j)=0
end if
else
if(icount.eq.2.or.icount.eq.3) then
b(i,j)=1
else
b(i,j)=0
end if
end if
end do
end do
call toroid(b)
return
end

subroutine toroid(a)
parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1)
! apply toroidal boundary conditions
do i=1,numi
a(i,0)=a(i,numj)
a(i,numj+1)=a(i,1)
end do
do j=0,numj+1
a(0,j)=a(numi,j)
a(numi+1,j)=a(1,j)
end do
return
end
``````

wildmyron
Posts: 1397
Joined: August 9th, 2013, 12:45 am

### Re: Golly could be faster?

Curious about the relative performance in Golly, I benchmarked the pattern generation time for a variety of random fill pattern sizes using both QuickLife and HashLife, for bounded and unbounded grids. All trials used a random fill with density of 5%, step 2^10 for 10 steps (total 10240 gen), maximum memory for algo set to 2000MB, and the average time from a number of repetitions recorded. For some trials the screen update probably had an affect on the timing but I don't think this weakens the comparison. I ran the test on a mostly idle Win7 laptop with Intel Core I5 CPU and 4GB of RAM. As far as I could tell there was no adverse memory pressure during any trial.

In general the results show that QuickLife in an unbounded grid gives the fastest generation times for this type of pattern and that bounded grids give unpredictable performance. If the pattern was run for considerably longer then HashLife would win out for sure, but this experiment is about determining the fate of random patterns evolved to a point where they are only mostly stabilised.

Something strange is going on at 2048x2048 with QuickLife in a bounded grid.

Here are the results:

Code: Select all

``````Algorithm   Size    Bound?  Nreps   Time (s)
============================================
Quicklife   512     No      20      0.06
Quicklife   1024    No      20      0.12
Quicklife   1536    No      20      0.22
Quicklife   2048    No      20      0.37
Quicklife   2560    No      20      0.54
Quicklife   3072    No      20      0.71
Quicklife   3584    No      20      0.96
Quicklife   4096    No      20      1.24

Hashlife    512     No      20      0.11
Hashlife    1024    No      20      0.69
Hashlife    1536    No      20      1.38
Hashlife    2048    No      20      2.8
Hashlife    2560    No      10      4.1
Hashlife    3072    No      10      6.6
Hashlife    3584    No      10      8.0
Hashlife    4096    No      10      10.1

Quicklife   512     Yes     5       17.2
Quicklife   1024    Yes     5       20
Quicklife   1536    Yes     5       22
Quicklife   2048    Yes     3       1310

Hashlife    512     Yes     10      3.1
Hashlife    1024    Yes     5       10.5
Hashlife    1536    Yes     5       21
Hashlife    2048    Yes     5       32
Hashlife    2560    Yes     5       57
Hashlife    3072    Yes     5       78``````
Here's the script I ran to get these results. Algo should be set manually and script parameters changed for each trial. This could easily be automated but I didn't bother.

Code: Select all

``````import golly as g
import time

bndGrid = False
Nreps = 10
size = 512*4
rect = [-size/2, -size/2, size, size]
rulestr = "B3/S23"
if bndGrid:
rulestr += ":T%d,%d" % (size, size)
g.setrule(rulestr)

tt = []
for rep in xrange(0,Nreps):
g.new("Pattern generation time trial")
g.select(rect)
g.randfill(5)
g.select([])
g.setmag(-2)
g.setstep(10)
g.update()
start_time = time.clock()
for ii in xrange(0, 10):
g.step()
g.update()
end_time = time.clock()
tt.append(end_time - start_time)
g.show("Trial %d elapsed time: %.2f" % (rep+1, tt[rep]))

g.show( "Elapsed times for %d trials: %.2fs (mean), %.2fs (min), %.2fs (max)" %
(Nreps, sum(tt)/len(tt), min(tt), max(tt)) )``````
@mollwollfumble As you can see from the above results, if you must use a bounded grid then you can get reasonable performance from Golly if you use HashLife. But if you want to run lots of trials then you should switch to QuickLife in an unbounded grid. You can avoid most boundary effects by padding the random pattern outside the area of interest. The size of the padding required can be determined with a script from another thread which explored the propagation speed and distance that a 1 bit flip produces in a large random soup. Sorry, I don't have a link to the thread.
The latest version of the 5S Project contains over 226,000 spaceships. There is also a GitHub mirror of the collection. Tabulated pages up to period 160 (out of date) are available on the LifeWiki.

calcyman
Posts: 2198
Joined: June 1st, 2009, 4:32 pm

### Re: Golly could be faster?

Tile the plane with copies of your 2048-by-2048 starting pattern, and run in HashLife.
What do you do with ill crystallographers? Take them to the mono-clinic!

dvgrn
Moderator
Posts: 6708
Joined: May 17th, 2009, 11:00 pm
Contact:

### Re: Golly could be faster?

calcyman wrote:Tile the plane with copies of your 2048-by-2048 starting pattern, and run in HashLife.
It's easy to generate a macrocell file that defines an area of 2048x2048 tiles as big as you might want. It would run reasonably fast at least at first, and the center tile would contain the correct descendant for 2048*N ticks (where N is the number of 2048x2048 tiles padding the central one on each side).

However, it seems like the usefulness of this trick would be limited for longer runs, because the chaotic mess eating its way in from the edges of the tiled area would start taking up most of HashLife's attention after a while. A modified HashLife algorithm, where you could tile the T=0 universe with non-empty initial tiles of any 2^x size, would avoid that problem very nicely.

mollwollfumble
Posts: 14
Joined: July 18th, 2016, 7:48 pm

### Re: Golly could be faster?

wildmyron wrote: Something strange is going on at 2048x2048 with QuickLife in a bounded grid.

@mollwollfumble As you can see from the above results, if you must use a bounded grid then you can get reasonable performance from Golly if you use HashLife. But if you want to run lots of trials then you should switch to QuickLife in an unbounded grid. You can avoid most boundary effects by padding the random pattern outside the area of interest. The size of the padding required can be determined with a script from another thread which explored the propagation speed and distance that a 1 bit flip produces in a large random soup. Sorry, I don't have a link to the thread.
My only reason for choosing 2048*2048 for the size of the toroid is because that's what Achim Flammenkamp used. I'm choosing 5% initial density because that gives room for initial patterns to expand without too much interference, the final ash density is 30% of maximum.

Thanks for that, but I've got my standalone code working now (I think). It totally ignores oscillators with period 2 and limits its calculation on each row to between the first active cell on that row and the last active cell on that row, taking into account that the active zone can expand at the speed of light.

Run times:
Golly on Windows. Quicklife, 23m43s for 10,000 generations
Golly on Android. Rule Loader. 3m31s for 10,000 generations
Optimised standalone on Cygwin. 6.4s for 20,000 generations.

Code for optimised standalone is:

Code: Select all

``````	parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
integer first0(numj),last0(numj)
a=0
!	  call system_clock(COUNT=iseed)
!	  call srand(iseed)
do j=1,numj
do i=1,numi
if(rand().lt.0.05) then
a(i,j)=1
end if
end do
end do
call toroid(a)
b=a
first0=1
last0=numi
! main calculation
do itime=1,10000
call twosteprow(a,b,first0,last0)
call twosteprow(b,a,first0,last0)
end do
! find population
ipop=0
do j=1,numj
do i=1,numi
if(a(i,j).eq.1) then
ipop=ipop+1
end if
end do
end do
print*,itime*2-2,ipop
stop
end

subroutine twosteprow(a,b,first0,last0)

parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
integer first(0:numj+1),last(0:numj+1),first0(numj),last0(numj)
do j=0,numj
first(j)=numi+1
last(j)=0
do i=first0(j),last0(j)
icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
if(a(i,j).eq.0) then  ! most common case, empty square
if(icount.eq.3) then  ! cell born
if(b(i,j).eq.0) then  ! note that b ne a for period 2 oscillators
if(first(j).eq.numi+1) first(j)=i
last(j)=i
b(i,j)=1
end if
if(b(i,j).eq.1) then
if(first(j).eq.numi+1) first(j)=i
last(j)=i
b(i,j)=0
end if
end if
else  ! a(i,j)=1
if(icount.eq.2.or.icount.eq.3) then  ! cell survives
if(b(i,j).eq.0) then
if(first(j).eq.numi+1) first(j)=i
last(j)=i
b(i,j)=1
end if
else  ! cell dies
if(b(i,j).eq.1) then
if(first(j).eq.numi+1) first(j)=i
last(j)=i
b(i,j)=0
end if
end if
end if
end do
end do
call toroid(b)
! expand first & last at speed of light to get new first0 and last0.
first(0)=first(numj)
first(numj+1)=first(1)
last(0)=last(numj)
last(numj+1)=last(1)
do j=1,numj
first0(j)=min(first(j-1),first(j),first(j+1))-1
last0(j)=max(last(j-1),last(j),last(j+1))+1
if(first0(j).lt.1.or.last0(j).gt.numi) then  ! cross toroidal boundaries
first0(j)=1
last0(j)=numi
end if
end do
return
end

subroutine toroid(a)
parameter(numi=2048,numj=2048)
integer a(0:numi+1,0:numj+1)
! apply toroidal boundary conditions
do i=1,numi
a(i,0)=a(i,numj)
a(i,numj+1)=a(i,1)
end do
do j=0,numj+1
a(0,j)=a(numi,j)
a(numi+1,j)=a(1,j)
end do
return
end
``````

mollwollfumble
Posts: 14
Joined: July 18th, 2016, 7:48 pm

### Re: Golly could be faster?

My standalone still isn't working - it's an annoying feature of the way that the compiler optimises fortran that a program that runs perfectly with debugging in place immediately misbehaves when debugging is switched off.

I've confirmed the extreme misbehaviour of Quicklife for this problem, and great performance by Hashlife.

Windows, quicklife - 10,000 generations 23m 43s
Windows, hashlife - 20,000 generations 0m 40s
Android, quicklife - 10,000 generations 106m 50s
Android, hashlife - 20,000 generations 2m 46s

That's an enormous difference in speed.

mollwollfumble
Posts: 14
Joined: July 18th, 2016, 7:48 pm

### Re: Golly could be faster?

Confirmed that 6.0 seconds for 20,000 generations of this problem is possible in standalone Fortran (without multi-threading). As against 38 seconds for Golly hash hyperspeed on same computer.

Adding in search for transients "queen bee" and "switch engine" at every generation increases this to a pleasant 11.6 seconds.