Golly could be faster?

mollwollfumble · Post by **mollwollfumble** » July 27th, 2016, 9:24 pm

Am new here. First time I ran Golly I thought "cripes that's fast", and it is, but not always. Golly (with default settings) handles "ash" extremely badly. I just wrote a 37 line standalone Fortran program, with no optimisation at all apart from the O3 compiler tag. On my old 32-bit computer it runs in 4 minutes 11 seconds. On the same problem, Golly is more than 5 times as slow!

The problem is 10,000 generations on a 5% random start on a 2048*2048 toroidal grid.

Ideally, a program designed to calculate Life ought to ignore ash completely, or at least ignore still lifes and period 2 oscillators blinker, toad and beacon. So it ought to speed up as the pattern develops towards ash. Is there a switch on Golly that does this?

Code: Select all

	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1)
	a=0
	do j=1,numj
	  do i=1,numi
	    if(rand().lt.0.05) then
	      a(i,j)=1
	    end if
	  end do
	end do
	do itime=1,10000
	  do j=1,numj
	    do i=1,numi
	      icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
	      if(a(i,j).eq.0) then
	        if(icount.eq.3) then
	          a(i,j)=1
	        end if
	      else
	        if(icount.lt.2.or.icount.gt.3) then
	          a(i,j)=0
	        end if
	      end if
	    end do
	  end do
	  do i=1,numi
	    a(i,0)=a(i,numj)
	    a(i,numj+1)=a(i,1)
	  end do
	  do j=0,numj+1
	    a(0,j)=a(numi,j)
	    a(numi+1,j)=a(1,j)
	  end do
	end do
	stop
	end

Post by **Andrew** » July 27th, 2016, 10:10 pm

mollwollfumble wrote:... The problem is 10,000 generations on a 5% random start on a 2048*2048 toroidal grid.

Copied from Help > Bounded Grids:

Pattern generation in a bounded grid is slower than in an unbounded grid. This is because all the current algorithms have been designed to work with unbounded grids, so Golly has to do extra work to create the illusion of a bounded grid.

By "extra work" I mean that any live cells at the requested boundary have to be copied to the appropriate places outside the joined edges to calculate the next generation, then any live cells outside the boundary have to be deleted. The bigger the bounded grid, the more work. Also, the step increment has to be limited to 1, so we lose most of HashLife's advantages.

... So it ought to speed up as the pattern develops towards ash. Is there a switch on Golly that does this?

Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.

Post by **dvgrn** » July 28th, 2016, 10:43 am

Andrew wrote:Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.

For the particular problem mentioned (2048x2048 toroidal grid) it would be easy -- or relatively easy at least -- to get an algorithm working that would simulate the bounded grid and still keep all the speed benefits of Hashlife. It should be possible to tweak Hashlife to set up a 2048x2048 tile as its own next-door neighbor in all eight directions.

An MxN bounded grid with M or N not being a power of two, seems like a somewhat taller order. There would still be some hope of arranging for the algorithm to ignore specific rows or columns in different-sized hashtiles, at some relatively minor cost in efficiency... but I'm not going to be tempted to try to code anything like that until my winning lottery checks start coming in and I have more time available to think. Which may be a while, unless someone else starts buying tickets for me.

mollwollfumble · Post by **mollwollfumble** » July 28th, 2016, 10:54 pm

dvgrn wrote:
Andrew wrote:Use HashLife and don't use a bounded grid. EDIT: Note that a bounded grid is not Golly's "default setting". The default rules in all algos don't specify a bounded grid.
For the particular problem mentioned (2048x2048 toroidal grid) it would be easy -- or relatively easy at least -- to get an algorithm working that would simulate the bounded grid and still keep all the speed benefits of Hashlife. It should be possible to tweak Hashlife to set up a 2048x2048 tile as its own next-door neighbor in all eight directions.

An MxN bounded grid with M or N not being a power of two, seems like a somewhat taller order. There would still be some hope of arranging for the algorithm to ignore specific rows or columns in different-sized hashtiles, at some relatively minor cost in efficiency... but I'm not going to be tempted to try to code anything like that until my winning lottery checks start coming in and I have more time available to think. Which may be a while, unless someone else starts buying tickets for me.

Thanks all. BTW, I've lost all credibility by posting that screamingly wrong script above.

Weirdly, Golly is 6.75 times faster on Android than on Windows for this problem. Could Windows be the problem?

The corrected un-optimised (except for -O3 flag) standalone is as follows. For this specific problem on Cygwin it runs 11.75 times faster than Golly does on Windows. I'm now writing a faster standalone designed to ignore all still lifes and period 2 oscillators in the ash, but it isn't working yet.

Code: Select all

! separate odd and even generations
	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
	a=0
! setup
	do j=1,numj
	  do i=1,numi
	    if(rand().lt.0.05) then
	      a(i,j)=1
	    end if
	  end do
	end do
	call toroid(a)
! start
	do itime=1,5000
	  call twostep(a,b)
	  call twostep(b,a)
	end do
! output
	open(1,file='two_end.txt',status='unknown')
	do j=0,numj+1
	  write(1,'(2050i1)') (a(i,j),i=0,numi+1)
	end do
	close(1)
	stop
	end

	subroutine twostep(a,b)  ! input a, output b
	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
! main routine
	do j=1,numj
	  do i=1,numi
	    icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
	    if(a(i,j).eq.0) then
	      if(icount.eq.3) then
	        b(i,j)=1
	      else
	        b(i,j)=0
	      end if
	    else
	      if(icount.eq.2.or.icount.eq.3) then
	        b(i,j)=1
	      else
	        b(i,j)=0
	      end if
	    end if
	  end do
	end do
	call toroid(b)
	return
	end

	subroutine toroid(a)
	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1)
! apply toroidal boundary conditions
	do i=1,numi
	  a(i,0)=a(i,numj)
	  a(i,numj+1)=a(i,1)
	end do
	do j=0,numj+1
	  a(0,j)=a(numi,j)
	  a(numi+1,j)=a(1,j)
	end do
	return
	end

wildmyron · Post by **wildmyron** » July 29th, 2016, 5:02 am

Curious about the relative performance in Golly, I benchmarked the pattern generation time for a variety of random fill pattern sizes using both QuickLife and HashLife, for bounded and unbounded grids. All trials used a random fill with density of 5%, step 2^10 for 10 steps (total 10240 gen), maximum memory for algo set to 2000MB, and the average time from a number of repetitions recorded. For some trials the screen update probably had an affect on the timing but I don't think this weakens the comparison. I ran the test on a mostly idle Win7 laptop with Intel Core I5 CPU and 4GB of RAM. As far as I could tell there was no adverse memory pressure during any trial.

In general the results show that QuickLife in an unbounded grid gives the fastest generation times for this type of pattern and that bounded grids give unpredictable performance. If the pattern was run for considerably longer then HashLife would win out for sure, but this experiment is about determining the fate of random patterns evolved to a point where they are only mostly stabilised.

Something strange is going on at 2048x2048 with QuickLife in a bounded grid.

Here are the results:

Code: Select all

Algorithm   Size    Bound?  Nreps   Time (s)
============================================
Quicklife   512     No      20      0.06
Quicklife   1024    No      20      0.12
Quicklife   1536    No      20      0.22
Quicklife   2048    No      20      0.37
Quicklife   2560    No      20      0.54
Quicklife   3072    No      20      0.71
Quicklife   3584    No      20      0.96
Quicklife   4096    No      20      1.24

Hashlife    512     No      20      0.11
Hashlife    1024    No      20      0.69
Hashlife    1536    No      20      1.38
Hashlife    2048    No      20      2.8
Hashlife    2560    No      10      4.1
Hashlife    3072    No      10      6.6
Hashlife    3584    No      10      8.0
Hashlife    4096    No      10      10.1

Quicklife   512     Yes     5       17.2
Quicklife   1024    Yes     5       20
Quicklife   1536    Yes     5       22
Quicklife   2048    Yes     3       1310

Hashlife    512     Yes     10      3.1
Hashlife    1024    Yes     5       10.5
Hashlife    1536    Yes     5       21
Hashlife    2048    Yes     5       32
Hashlife    2560    Yes     5       57
Hashlife    3072    Yes     5       78

Here's the script I ran to get these results. Algo should be set manually and script parameters changed for each trial. This could easily be automated but I didn't bother.

Code: Select all

import golly as g
import time

bndGrid = False
Nreps = 10
size = 512*4
rect = [-size/2, -size/2, size, size]
rulestr = "B3/S23"
if bndGrid:
    rulestr += ":T%d,%d" % (size, size)
g.setrule(rulestr)

tt = []
for rep in xrange(0,Nreps):
    g.new("Pattern generation time trial")
    g.select(rect)
    g.randfill(5)
    g.select([])
    g.setmag(-2)
    g.setstep(10)
    g.update()
    start_time = time.clock()
    for ii in xrange(0, 10):
        g.step()
        g.update()
    end_time = time.clock()
    tt.append(end_time - start_time)
    g.show("Trial %d elapsed time: %.2f" % (rep+1, tt[rep]))

g.show( "Elapsed times for %d trials: %.2fs (mean), %.2fs (min), %.2fs (max)" % 
        (Nreps, sum(tt)/len(tt), min(tt), max(tt)) )

@mollwollfumble As you can see from the above results, if you must use a bounded grid then you can get reasonable performance from Golly if you use HashLife. But if you want to run lots of trials then you should switch to QuickLife in an unbounded grid. You can avoid most boundary effects by padding the random pattern outside the area of interest. The size of the padding required can be determined with a script from another thread which explored the propagation speed and distance that a 1 bit flip produces in a large random soup. Sorry, I don't have a link to the thread.

Post by **calcyman** » July 31st, 2016, 4:00 am

Tile the plane with copies of your 2048-by-2048 starting pattern, and run in HashLife.

Post by **dvgrn** » July 31st, 2016, 9:22 am

calcyman wrote:Tile the plane with copies of your 2048-by-2048 starting pattern, and run in HashLife.

It's easy to generate a macrocell file that defines an area of 2048x2048 tiles as big as you might want. It would run reasonably fast at least at first, and the center tile would contain the correct descendant for 2048*N ticks (where N is the number of 2048x2048 tiles padding the central one on each side).

However, it seems like the usefulness of this trick would be limited for longer runs, because the chaotic mess eating its way in from the edges of the tiled area would start taking up most of HashLife's attention after a while. A modified HashLife algorithm, where you could tile the T=0 universe with non-empty initial tiles of any 2^x size, would avoid that problem very nicely.

mollwollfumble · Post by **mollwollfumble** » August 2nd, 2016, 8:00 pm

wildmyron wrote: Something strange is going on at 2048x2048 with QuickLife in a bounded grid.

@mollwollfumble As you can see from the above results, if you must use a bounded grid then you can get reasonable performance from Golly if you use HashLife. But if you want to run lots of trials then you should switch to QuickLife in an unbounded grid. You can avoid most boundary effects by padding the random pattern outside the area of interest. The size of the padding required can be determined with a script from another thread which explored the propagation speed and distance that a 1 bit flip produces in a large random soup. Sorry, I don't have a link to the thread.

My only reason for choosing 2048*2048 for the size of the toroid is because that's what Achim Flammenkamp used. I'm choosing 5% initial density because that gives room for initial patterns to expand without too much interference, the final ash density is 30% of maximum.

Thanks for that, but I've got my standalone code working now (I think). It totally ignores oscillators with period 2 and limits its calculation on each row to between the first active cell on that row and the last active cell on that row, taking into account that the active zone can expand at the speed of light.

Run times:
Golly on Windows. Quicklife, 23m43s for 10,000 generations
Golly on Android. Rule Loader. 3m31s for 10,000 generations
Optimised standalone on Cygwin. 6.4s for 20,000 generations.

Code for optimised standalone is:

Code: Select all

	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
	integer first0(numj),last0(numj)
	a=0
!	  call system_clock(COUNT=iseed)
!	  call srand(iseed)
	do j=1,numj
	  do i=1,numi
	    if(rand().lt.0.05) then
	      a(i,j)=1
	    end if
	  end do
	end do
	call toroid(a)
	b=a
	first0=1
	last0=numi
! main calculation
	do itime=1,10000
	  call twosteprow(a,b,first0,last0)
	  call twosteprow(b,a,first0,last0)
	end do
! find population
	ipop=0
	do j=1,numj
	  do i=1,numi
	    if(a(i,j).eq.1) then
	      ipop=ipop+1
	    end if
	  end do
	end do
	print*,itime*2-2,ipop
	stop
	end

	subroutine twosteprow(a,b,first0,last0)

	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1),b(0:numi+1,0:numj+1)
	integer first(0:numj+1),last(0:numj+1),first0(numj),last0(numj)
	do j=0,numj
	  first(j)=numi+1
	  last(j)=0
	  do i=first0(j),last0(j)
	    icount=a(i-1,j-1)+a(i,j-1)+a(i+1,j-1)+a(i-1,j)+a(i+1,j)+a(i-1,j+1)+a(i,j+1)+a(i+1,j+1)
	    if(a(i,j).eq.0) then  ! most common case, empty square
	      if(icount.eq.3) then  ! cell born
	        if(b(i,j).eq.0) then  ! note that b ne a for period 2 oscillators
	          if(first(j).eq.numi+1) first(j)=i
	          last(j)=i
	          b(i,j)=1
	        end if
	      else  ! cell stays dead
	        if(b(i,j).eq.1) then
	          if(first(j).eq.numi+1) first(j)=i
	          last(j)=i
	          b(i,j)=0
	        end if
	      end if
	    else  ! a(i,j)=1
	      if(icount.eq.2.or.icount.eq.3) then  ! cell survives
	        if(b(i,j).eq.0) then
	          if(first(j).eq.numi+1) first(j)=i
	          last(j)=i
	          b(i,j)=1
	        end if
	      else  ! cell dies
	        if(b(i,j).eq.1) then
	          if(first(j).eq.numi+1) first(j)=i
	          last(j)=i
	          b(i,j)=0
	        end if
	      end if
	    end if
	  end do
	end do
	call toroid(b)
! expand first & last at speed of light to get new first0 and last0.
	first(0)=first(numj)
	first(numj+1)=first(1)
	last(0)=last(numj)
	last(numj+1)=last(1)
	do j=1,numj
	  first0(j)=min(first(j-1),first(j),first(j+1))-1
	  last0(j)=max(last(j-1),last(j),last(j+1))+1
	  if(first0(j).lt.1.or.last0(j).gt.numi) then  ! cross toroidal boundaries
	    first0(j)=1
	    last0(j)=numi
	  end if
	end do
	return
	end

	subroutine toroid(a)
	parameter(numi=2048,numj=2048)
	integer a(0:numi+1,0:numj+1)
! apply toroidal boundary conditions
	do i=1,numi
	  a(i,0)=a(i,numj)
	  a(i,numj+1)=a(i,1)
	end do
	do j=0,numj+1
	  a(0,j)=a(numi,j)
	  a(numi+1,j)=a(1,j)
	end do
	return
	end

mollwollfumble · Post by **mollwollfumble** » August 3rd, 2016, 4:32 pm

My standalone still isn't working - it's an annoying feature of the way that the compiler optimises fortran that a program that runs perfectly with debugging in place immediately misbehaves when debugging is switched off.

I've confirmed the extreme misbehaviour of Quicklife for this problem, and great performance by Hashlife.

Windows, quicklife - 10,000 generations 23m 43s
Windows, hashlife - 20,000 generations 0m 40s
Android, quicklife - 10,000 generations 106m 50s
Android, hashlife - 20,000 generations 2m 46s

That's an enormous difference in speed.

mollwollfumble · Post by **mollwollfumble** » August 6th, 2016, 10:05 pm

Confirmed that 6.0 seconds for 20,000 generations of this problem is possible in standalone Fortran (without multi-threading). As against 38 seconds for Golly hash hyperspeed on same computer.

Adding in search for transients "queen bee" and "switch engine" at every generation increases this to a pleasant 11.6 seconds.

ConwayLife.com

Golly could be faster?

Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?

Re: Golly could be faster?