Still life frequency estimator

For scripts to aid with computation or simulation in cellular automata.
Post Reply
User avatar
Kazyan
Posts: 1247
Joined: February 6th, 2014, 11:02 pm

Still life frequency estimator

Post by Kazyan » October 19th, 2016, 4:27 am

Posting this so I don't forget to improve it later. Given a still life placed in the active layer and with that layer being otherwise empty, this script estimates the frequency of that still life in all ash objects.

Currently, it's pretty bad--it has a variance of about one order of magnitude with respect to Catagolue's frequencies, and it's completely clueless for common objects. This is partly because it's heavily biased towards rare objects and partly because it's based on linear regression. I intend to rewrite it to use a really rudimentary version of a convolutional neural network, once I figure out how the relevant Python module works.

Code: Select all

# Still life frequency estimator
# Estimates the frequency of occurrence of a still life--not terribly well, though. Has a variance of 0.9537 orders of magnitude, based on multilinear regression from Catagolue's text census on October 19th 2016, at 5:10 am UTC. Still lifes in the census with 9 or fewer occurrences were omitted from the regression to obtain tile weights, because of 'survivor bias'.
# Tanner Jacobi, Oct 2016

import golly as g

# The current algorithm is to take snapshots of the still life as 3x3 'tiles', going over every such 3x3 square that can capture a live cell of the still life. Then, each type of tile is assigned a weight, and the weighted tiles are summed to get the final frequency.
Tiles = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] # the total numbers of each of the 46 kinds of tiles in the SL, initialized as zero.
TileIdentity = dict({ # Containing the hashes of each kind of possible 3x3 tile for a still life in all orientations, and which tile that hash corresponds to.
'31415962' : 0, # This is the empty tile; its weight is fixed at zero.
'529404522' : 1,
'529404520' : 1,
'527404518' : 1,
'527404516' : 1,
'529404523' : 2,
'530404527' : 2,
'527404517' : 2,
'530404525' : 2,
'911400120' : 3,
'-1938807252' : 3,
'-254392440' : 3,
'909400116' : 3,
'909400118' : 4,
'-1938807250' : 4,
'911400123' : 5,
'-1939807255' : 5,
'-979772411' : 5,
'1493296399' : 5,
'190020161' : 5,
'54536495' : 5,
'-254392437' : 5,
'912400125' : 5,
'912400127' : 6,
'-1938807249' : 6,
'1493296397' : 6,
'192020169' : 6,
'-1939807253' : 6,
'192020171' : 6,
'54536493' : 6,
'909400117' : 6,
'189020160' : 7,
'189020162' : 7,
'54536492' : 7,
'1493296398' : 7,
'192020168' : 8,
'1494296400' : 8,
'-71026502' : 9,
'1031927222' : 9,
'1990468510' : 9,
'-207510174' : 9,
'2057800913' : 10,
'1034927231' : 10,
'1273088563' : 10,
'1047164657' : 10,
'1452738439' : 10,
'-1445635155' : 10,
'1990468509' : 10,
'-931890141' : 10,
'-207510176' : 11,
'1031927220' : 11,
'-208510179' : 12,
'1031927221' : 12,
'1047164659' : 12,
'-1296297725' : 12,
'311547265' : 12,
'-1296297727' : 12,
'-1445635153' : 12,
'-207510173' : 12,
'-1296297726' : 13,
'1044164650' : 13,
'-571917758' : 14,
'-1295297724' : 14,
'-1445635154' : 14,
'1047164656' : 14,
'-864283798' : 15,
'-306988909' : 16,
'-5420167' : 16,
'2020736473' : 16,
'-574079009' : 16,
'-306988911' : 16,
'124326539' : 16,
'-1965621189' : 16,
'-1444180071' : 16,
'-5420165' : 17,
'-1581663745' : 17,
'1212474249' : 17,
'-581956263' : 17,
'-1444180069' : 17,
'1371614307' : 17,
'514068701' : 17,
'-864283799' : 17,
'-726800124' : 18,
'-726800122' : 18,
'1650091480' : 18,
'231741166' : 18,
'-305988906' : 19,
'-896696792' : 19,
'1330209402' : 19,
'-574079012' : 19,
'-305988908' : 19,
'841706486' : 19,
'548755312' : 19,
'1965322198' : 19,
'-1447180078' : 20,
'2088994254' : 20,
'637072770' : 20,
'-581956262' : 20,
'-306988910' : 21,
'-1981974514' : 21,
'-13923114' : 21,
'1452577190' : 21,
'-1444180072' : 22,
'1951510578' : 22,
'-685520528' : 22,
'514068702' : 22,
'-5420166' : 22,
'1937854220' : 22,
'-1405465294' : 22,
'518562268' : 22,
'-305988905' : 23,
'1959020415' : 23,
'-930098307' : 23,
'-711562687' : 23,
'-1447180079' : 24,
'1234130631' : 24,
'1468459055' : 24,
'231741165' : 24,
'-723800113' : 24,
'924711509' : 24,
'919400305' : 24,
'1239942227' : 24,
'-726800123' : 25,
'-1260594559' : 25,
'2083681827' : 25,
'1090604799' : 25,
'1965322196' : 26,
'-896696790' : 26,
'1239448672' : 26,
'1212474250' : 26,
'1452577188' : 27,
'1371614304' : 27,
'-1981974516' : 27,
'143423708' : 27,
'-711562685' : 28,
'1234130629' : 28,
'1090604797' : 28,
'1650091483' : 28,
'1959020413' : 28,
'-485638781' : 28,
'-1260594557' : 28,
'1239942225' : 28,
'1964322193' : 29,
'-896696789' : 29,
'-80307177' : 29,
'-1405465295' : 29,
'1954510587' : 29,
'-1410900499' : 29,
'637072769' : 29,
'1965322199' : 29,
'518562269' : 30,
'-1981974515' : 30,
'-2122845241' : 30,
'-546601661' : 30,
'1951510577' : 30,
'-546601663' : 30,
'-685520525' : 30,
'1452577191' : 30,
'1964322194' : 31,
'1231130622' : 31,
'-2109128294' : 31,
'919400306' : 31,
'1240942228' : 32,
'1959020412' : 32,
'1644780276' : 32,
'-1267981620' : 32,
'1954510584' : 32,
'-184629272' : 32,
'1468459052' : 32,
'-711562688' : 32,
'1090604798' : 33,
'-1260594558' : 33,
'-1267981618' : 33,
'-184629270' : 33,
'-546601662' : 34,
'-1264981609' : 35,
'-184629269' : 35,
'2019918837' : 35,
'-1267981619' : 35,
'-1132563917' : 36,
'1915115611' : 36,
'-483441751' : 36,
'-1291703975' : 36,
'-552667648' : 37,
'894092280' : 37,
'-1883842740' : 37,
'-1955422912' : 37,
'894092282' : 38,
'-1662471738' : 38,
'-1102388650' : 38,
'385323778' : 38,
'-552667646' : 38,
'502907382' : 38,
'-1173968822' : 38,
'-1291703974' : 38,
'1915115608' : 39,
'-566324004' : 39,
'1222852148' : 39,
'1105268544' : 39,
'-552667647' : 40,
'1531781083' : 40,
'-1009722861' : 40,
'-1867105455' : 40,
'894092283' : 40,
'1795601261' : 40,
'-1318332495' : 40,
'1244187411' : 40,
'-191185443' : 41,
'497472177' : 41,
'1080224867' : 41,
'1105268547' : 41,
'-332056169' : 42,
'1531781081' : 42,
'379888573' : 42,
'-1773986061' : 42,
'502907381' : 42,
'1710481895' : 42,
'1222852151' : 42,
'1244187409' : 42,
'1244187410' : 43,
'1531781080' : 43,
'1797604814' : 43,
'1080224864' : 43,
'-1946277244' : 44,
'1628745286' : 44,
'-758004836' : 44,
'23449254' : 44,
'1540427828' : 45,
'-648148160' : 45,
'2019918836' : 46,
'1239942226' : 46,
'-1259594556' : 46,
'-1493905522' : 46,
'2019918838' : 46,
'2083681824' : 46,
'-714562694' : 46,
'1234130628' : 46,
'1628745285' : 47,
'-1373528131' : 47,
'568494839' : 47,
'-2088904525' : 47,
'-1246782567' : 47,
'116115043' : 47,
'-648148157' : 47,
'1623310081' : 47,
'-1246782566' : 48,
'-1371524578' : 48,
'414589924' : 49,
'-1867105454' : 49,
'-1132563920' : 49,
'-191185442' : 49,
'-191185444' : 49,
'-415183970' : 49,
'-1773986064' : 49,
'1776865958' : 49,
'124326536' : 50,
'143300938' : 50,
'518562270' : 50,
'1951510576' : 50
})
TileCoefficients = [ # The weights for the linear summation.
0,
-0.1833823465,
-0.6633731226,
0.0079807725,
0,
-0.3440052121,
-0.1991134337,
-0.079680031,
0,
-0.1123744239,
0.1001229256,
0,
-0.0490842331,
0.0369902573,
-0.4847022735,
0,
0.4311327989,
0,
-0.3153920793,
-0.2547064898,
0.2014220538,
-0.2897059005,
-0.1852881729,
0,
-0.0927083983,
0.0572225565,
-0.2531232705,
0,
0,
-0.4587600144,
-0.3675620454,
-0.2024524793,
-0.2458626439,
0,
0,
0,
-0.1101965522,
0,
0.1093954496,
-0.0643093185,
-0.2203335787,
-0.3418014888,
-0.122113903,
-0.0737090746,
-0.0389233138,
0,
-0.0282724468,
-0.3363654026,
0,
-0.1452367042,
-0.1980596678]
SubjectStillLife = g.getcells(g.getrect())
if SubjectStillLife == g.evolve(SubjectStillLife, 1):
	g.getrect()
	BoxX = g.getrect()[0]
	BoxY = g.getrect()[1]
	Width = g.getrect()[2]
	Height = g.getrect()[3]
	for i in range(-2, Width):
		for j in range(-2, Height): # Taking snapshots over every cell that could capture a live cell
			ThisTile = TileIdentity[str(g.hash([BoxX+i,BoxY+j,3,3]))] # Which kind of tile is the one with its upper-left corner at i,j?
			Tiles[ThisTile] = Tiles[ThisTile] + 1 # Add it to the tile census
	Freq = 0
	for i in xrange(51):
		Freq = Freq + Tiles[i]*TileCoefficients[i]
	g.show("Estimated frequency of still life, log base 10: " + str(Freq))
else:
	g.show("Not a still life.")
Tanner Jacobi
Coldlander, a novel, available in paperback and as an ebook. Now on Amazon.

User avatar
calcyman
Moderator
Posts: 2938
Joined: June 1st, 2009, 4:32 pm

Re: Still life frequency estimator

Post by calcyman » October 19th, 2016, 6:50 am

Kazyan wrote:Posting this so I don't forget to improve it later. Given a still life placed in the active layer and with that layer being otherwise empty, this script estimates the frequency of that still life in all ash objects.

Currently, it's pretty bad--it has a variance of about one order of magnitude with respect to Catagolue's frequencies, and it's completely clueless for common objects. This is partly because it's heavily biased towards rare objects and partly because it's based on linear regression. I intend to rewrite it to use a really rudimentary version of a convolutional neural network, once I figure out how the relevant Python module works.

Code: Select all

# Still life frequency estimator
# Estimates the frequency of occurrence of a still life--not terribly well, though. Has a variance of 0.9537 orders of magnitude, based on multilinear regression from Catagolue's text census on October 19th 2016, at 5:10 am UTC. Still lifes in the census with 9 or fewer occurrences were omitted from the regression to obtain tile weights, because of 'survivor bias'.
Beautiful!

I'm confused why any of the weights should be positive -- this suggests that the corresponding regions are more common than empty space...?

For a second-order refinement of this, you could take the residuals (actual log(f) - fitted log(f)) and perform ridge regression based on 4-by-4 regions.
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
Kazyan
Posts: 1247
Joined: February 6th, 2014, 11:02 pm

Re: Still life frequency estimator

Post by Kazyan » October 19th, 2016, 12:34 pm

calcyman wrote:Beautiful!

I'm confused why any of the weights should be positive -- this suggests that the corresponding regions are more common than empty space...?
Each live cell of the still life gets sampled 9 times, one for each overlapping 3x3 tile. I think those positive values are a combination of 1) correction factors for particular functional groups, and 2) what linear regression tends to do when you give it ill-fitting data--it gives you some answer, even if it's one that doesn't make sense.
calcyman wrote:For a second-order refinement of this, you could take the residuals (actual log(f) - fitted log(f)) and perform ridge regression based on 4-by-4 regions.
I don't think using 4x4 regions is feasible--there are already 50 possible 3x3 tiles for a still life.

However, thinking about how to wrangle the "convolutional neural network" idea into something my novice programming skill can handle, I now realize that I overcomplicated things for 3x3. Instead, I should have just written a custom rule with 52(?) states so that the central cell of each 3x3 'tile' evolves in one generation to the appropriate state, evolve a still life by 1 generation in that rule, then take a census of all cell states that appear. That census is then applied to the linear regression formula as normal. (As it turns out, Golly is really good at applying rules to 2-dimensional arrays of cells based on the properties of the adjacent cells...yeah...)

I suspect that a more elaborate custom rule, probably involving a runtime of more than 1 generation, will allow for even better identification of functional groups.
Tanner Jacobi
Coldlander, a novel, available in paperback and as an ebook. Now on Amazon.

User avatar
calcyman
Moderator
Posts: 2938
Joined: June 1st, 2009, 4:32 pm

Re: Still life frequency estimator

Post by calcyman » October 19th, 2016, 4:27 pm

Kazyan wrote:
calcyman wrote:For a second-order refinement of this, you could take the residuals (actual log(f) - fitted log(f)) and perform ridge regression based on 4-by-4 regions.
I don't think using 4x4 regions is feasible--there are already 50 possible 3x3 tiles for a still life.
A benefit of ridge regression over ordinary least squares is that it's still effective even when the dimension is much larger than the number of data points.
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
Kazyan
Posts: 1247
Joined: February 6th, 2014, 11:02 pm

Re: Still life frequency estimator

Post by Kazyan » October 19th, 2016, 10:26 pm

Update: I've tried a new way of processing still lifes with a custom rule. In generation 1, the rule sets two-neighbor live cells to state 2, and three-neighbor live cells to state 3. In generation 2, every possible 3x3 tile centered on a state 2 or state 3 cell is identified by the rule. These tiles are counted up and assigned weight, and the estimator proceeds as before. This was intended to allow gathering of information from a Moore distance of 2 instead of 1, so that functional groups become more distinct.

Surprise, though--this method is actually worse than the earlier attempt, giving a variance of 1.58 instead of 0.95. It appears that that arrangements of live cells around empty cells are rather important--the custom rule left all state 0 cells alone, whereas the previous method assigned them tiles.

I'll try that ridge regression next--it works off a better starting point. :)
Tanner Jacobi
Coldlander, a novel, available in paperback and as an ebook. Now on Amazon.

Post Reply