apgsearch v1.0

For general discussion about Conway's Game of Life.
calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

simsim314 wrote:May I ask what iterator did you used?
You mean the algorithm for running GoL? I'm using Life128, which is a bespoke algorithm I wrote (and attains a twofold speedup over Golly's QuickLife).

It partitions the universe into 28-by-28 squares. They tile the universe as below, so each square has six neighbours (rather than the eight concomitant with an ordinary grid-like tiling):

Each of these is surrounded by a two-cell border, giving a covering of the universe by overlapping 32-by-32 tiles.

The algorithm iterates over all tiles that have changed since the previous iteration, and advances each of them by 2 generations at once (hence why a two-cell border is required). If cells near the edge have changed, the neighbouring tiles are alerted. Note that advancing by two generations at a time means that typical ash (which is period-2 due to the presence of blinkers) can be ignored.

Advancing a tile by two generations is accomplished by a 882-instruction straight-line routine manually written in inline x86 assembly! It uses bitwise operations (similar to your LifeAPI) and Streaming SIMD Extensions (allowing four 32-bit rows to be processed simultaneously), so there is a 128-fold parallelisation (compared with 64 for LifeAPI) -- hence the name Life128.

The assembly code is designed to mesh well with things like branch prediction (in particular, there are no branches!) and pipelining (the code has a shallowish Gantt chart, so the CPU's multiple ALUs can be used simultaneously). Consequently, this code runs in about 320 clock cycles (200 nanoseconds) on my machine, despite there being 882 instructions.
simsim314 wrote:I would suggest to port the soup search to gpu as well, this should give another order of magnitude jump.
And indeed this would be necessary for any noticeable improvement, since Life128 already uses pretty much all of the tricks available to the CPU.

Unfortunately, my laptop lacks a GPU (by which I mean it doesn't support OpenCL -- it probably has a very basic inbuilt graphics chip), so I have no means of programming a GPU version of the search. Does anyone have a suitable machine into which I can ssh?

Of course, there's no rush -- I'm currently testing v2.0 with the help of Andrew, Dave, Tom and Nathaniel.
What do you do with ill crystallographers? Take them to the mono-clinic!

simsim314
Posts: 1766
Joined: February 10th, 2014, 1:27 pm

Re: apgsearch v1.0

calcyman wrote: 882-instruction straight-line routine manually written in inline x86 assembly! It uses bitwise operations (similar to your LifeAPI) and Streaming SIMD Extensions (allowing four 32-bit rows to be processed simultaneously), so there is a 128-fold parallelisation (compared with 64 for LifeAPI)
Nice! LifeAPI works twice as fast on x64 CPU over x32 CPU. Could it be then your code will work x256 if you'll use tiles of size 60x60 on x64 CPU?

Anyway LifeAPI strong point is not so much the iterator but the API, I would be glad to replace the iterator of LifeAPI by something faster (with permission of course).
calcyman wrote:my laptop lacks a GPU (by which I mean it doesn't support OpenCL -- it probably has a very basic inbuilt graphics chip), so I have no means of programming a GPU version of the search.
I've some pretty primitive LifeAPI version written using CUDA. But if you're looking for robustness, there is a way to use OpenGL for GPU general purpose programming, which GOL iterators is perfectly fitting for.

I must admit I never had the patience to understand the way OpenGL can be used for general purpose computing, but I know it's the oldest and the most general way to do it, and I think for GOL iterator it shouldn't be too hard. Here is a link with technical introduction to the topic.

calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

simsim314 wrote:LifeAPI works twice as fast on x64 CPU over x32 CPU. Could it be then your code will work x256 if you'll use tiles of size 60x60 on x64 CPU?
Unfortunately not.

SSE has 128-bit registers, so you can either process two 64-bit integers or four 32-bit integers (or even eight 16-bit registers, if you were that way inclined). I decided that 32-by-32 would be a good size for a tile (too large, and you're wasting operations since only a small part of the tile might be active; too small, and you have too much overhead communicating between tiles).

There is a sequel to SSE called AVX, which supports 256-bit registers, but it only exists on really modern CPUs:

And if you're a time-traveller, you may want to try AVX-512:

https://en.wikipedia.org/wiki/AVX-512

SSE, on the other hand, exists on all x86-64 processors so is safe to use.
simsim314 wrote:Anyway LifeAPI strong point is not so much the iterator but the API, I would be glad to replace the iterator of LifeAPI by something faster (with permission of course).
I think that LifeAPI's existing iterator is more fit for purpose. Life128 can only step an even number of generations, which is fine for the soup search (where you just want to run a soup to stabilisation) but annoying for other applications. I suppose I could add a function to iterate by one generation, but I've had enough of coding in assembly for the time being.
Here is a link with technical introduction to the topic.
Thanks! I recall that Ready was written using OpenCL rather than OpenGL, so I'll probably ask Tim Hutton (lead developer of Ready) how to proceed.
What do you do with ill crystallographers? Take them to the mono-clinic!

simsim314
Posts: 1766
Joined: February 10th, 2014, 1:27 pm

Re: apgsearch v1.0

calcyman wrote:here is a sequel to SSE called AVX, which supports 256-bit registers, but it only exists on really modern CPUs...Life128 can only step an even number of generations
Anyway thx for the tip, I think I can figure something out on my own - I really didn't knew about the SSE and AVX and it's amazing performance boost. My relatively advanced GPU is only 8 times faster than the 8 cores i7-CPU (as the GPU is x32), so AVX 512 would actually fly as fast as GPU (!), and 256 would be only twice slower.

Getting GPU comparable performance from multi core cpu is very impressive, so thx again.
calcyman wrote:Ready was written using OpenCL rather than OpenGL
Yes definitely OpenCL is much more pleasant to develop with rather than OpenGL. Using OpenGL is kind of unnatural trickery, and it's much more simple to use OpenCL or CUDA. The disadvantage is that not all GPU support OpenCL.

The point is that you mainly need the GOL iterator on GPU, so OpenGL could work for you (it's small and relatively simple component) - thus people could use tablets and smart phones, and considering the performance boost of the GPU I would say that the average smartphone will work as fast as multi-core golly soup search today.

calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

Anyway thx for the tip, I think I can figure something out on my own - I really didn't knew about the SSE and AVX and it's amazing performance boost.
By the way, if you write your code so that it naturally vectorises, then gcc with -O3 enabled will automatically try to use SSE where appropriate (I don't know whether the same is true of AVX, but it seems likely). Often it's possible to beat -O3 by 'rolling your own' assembly, but it's much easier to make mistakes.
My relatively advanced GPU is only 8 times faster than the 8 cores i7-CPU (as the GPU is x32), so AVX 512 would actually fly as fast as GPU (!), and 256 would be only twice slower.
Hmm, in addition to the SIMD width (amount of data that can be operated upon by a single instruction), you also need to factor how many instructions the CPU/GPU can perform. In the case of the CPU, it can be higher than the clock speed due to the existence of multiple ALUs. As for the GPU, I don't know anything about the micro-architecture so cannot comment.

Thanks for the GPU advice. I'm not sure that it's worth targeting smartphones: they're not really designed for intensive computation, and they tend to use ARM rather than x86 so I would need to rewrite much of the application.
What do you do with ill crystallographers? Take them to the mono-clinic!

simsim314
Posts: 1766
Joined: February 10th, 2014, 1:27 pm

Re: apgsearch v1.0

I've recently moved to use OpenMP (the Glue++ was written entirely with OpenMP), and starting from OpenMP 4.0 there is #pragma omp simd, I'll just need to place it in the iterator region of LifeAPI.
calcyman wrote:In the case of the CPU, it can be higher than the clock speed due to the existence of multiple ALUs. As for the GPU, I don't know anything about the micro-architecture so cannot comment.
GPU is a huge bunch of pretty simple and not so fast CPUs (usually clock cycle just bellow 1GHz). The high end GPUs will contain few thousands of these CPUs and each such CPU (called cuda core or compute unit), will have inside it few threads, and some memory and cache etc. They actually not so efficient, so each assembly command might take few cycles per thread. I like to think of GPU as huge cluster of simple CPUs.
calcyman wrote:I'm not sure that it's worth targeting smartphones: they're not really designed for intensive computation

Well smart phones and tablets are for gaming, this means they have pretty nice GPU. I'll give an example: rasberi-pi2 has CPU with 100MFLOPS per core (and has 4 cores), which is 50 times slower than i7 core. But when you look at the GPU, it's about 25GFLOPS, which is about 5 i7-cores (or considering AVX single i7-core). Obviously it's still much slower than the modern GPUs, and yet it's already good enough GPU for useful computation.

Mobile devices have usually between 5 to 20 GFLOPS, and they widely available, tablets or more advanced mobiles can reach 100 GFLOPS.

I'm not sure how useful will this be, as just having 1 good GPU will be like having 100 mobile devices, on the other hand - having small downloadable app for mobile, could get much more people and devices.

The Turtle
Posts: 102
Joined: May 6th, 2015, 8:14 pm
Location: Chicago, Illinois

Re: apgsearch v1.0

Why does the Catagolue say the block is more common than the blinker (ratio is about 14:13) while this (Achim Flammenkamp's) says the block is less common that the blinker (ratio is about 53:54)?
Only two things are constant: change and the speed of light.

calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

The Turtle wrote:Why does the Catagolue say the block is more common than the blinker (ratio is about 14:13) while this (Achim Flammenkamp's) says the block is less common that the blinker (ratio is about 53:54)?
Different methods of choosing random seeds. Catagolue creates a random 16-by-16 square in an otherwise empty universe; Achim randomly filled a 2048-by-2048 torus.
What do you do with ill crystallographers? Take them to the mono-clinic!

Kazyan
Posts: 959
Joined: February 6th, 2014, 11:02 pm

Re: apgsearch v1.0

Regarding how those starting conditions influence evolution, my guess is that the extra blocks in Catagolue partly come from Herschels. If left unchecked, one of those will leave two blocks and a ship dangling off the main mass of ash. On a torus, Herschels would be stopped in their tracks much more frequently than, say, the small burst of a traffic light, because there's usually no void to reach out into; typically there will be something in the way. Indeed, Catagolue says about 3.1% of all Life objects are ships, but Flammenkamp's census estimates 0.75%. Traffic lights also have a tiny footprint for so many blinkers in one place, but reactions that create a bunch of blocks take up a lot more room--and thus they'll often get hit by something else going on in Flammenkamp's census, whereas they'll have a chance to be on the edge of the soup for apgsearch.
Tanner Jacobi
Coldlander, a novel, available in paperback and as an ebook.

The Turtle
Posts: 102
Joined: May 6th, 2015, 8:14 pm
Location: Chicago, Illinois

Re: apgsearch v1.0

calcyman wrote:
The Turtle wrote:Why does the Catagolue say the block is more common than the blinker (ratio is about 14:13) while this (Achim Flammenkamp's) says the block is less common that the blinker (ratio is about 53:54)?
Different methods of choosing random seeds. Catagolue creates a random 16-by-16 square in an otherwise empty universe; Achim randomly filled a 2048-by-2048 torus.
So if the random fill square size was increased, the block to blinker ratio will decrease?
If so, what size would set the block to blinker ratio at 1:1?
Only two things are constant: change and the speed of light.

gmc_nxtman
Posts: 1149
Joined: May 26th, 2015, 7:20 pm

Re: apgsearch v1.0

Some more ov_q7s appearing in Morley D2_+1/C4_1 searches:

Code: Select all

x = 32, y = 32, rule = B368/S245
7obobob3o9b2o2b2obo$bob2ob2o2b3obo2b2o6b4ob2o$o2bob4o4bo8b3o2bo3bo$2o 4bob2o2b2o3b2ob4o2bo2b3o$b2obobo4b2ob2o3b3ob2ob2o2b2o$bob3o2b2ob2o5b2o b2obobo2bobo$2o9b4o2bob4o4b5o$obob2obo3b3o2b2o2bo3bo4b2o$2b3o6bob3o3bo
2bo3bob2obo$2b2ob2obo6b2obob2o4bobo$3b4o2bob4ob2obo10b2o$3b2ob2obob2ob ob2ob3ob5o2bo$4b3obob4ob7o2b5ob2o$bobobo3bo2b2o4b2obob3o2b2obo$bobo2b
2o2b3o7b2obobobo2b2o$7bob4o6bo2b2o3bo$4bo3b2o2bo6b4obo$2o2bobobob2o7b 3o2b2o2bobo$ob2o2b3obob2o4b2o2bo3bobobo$2ob5o2b7ob4obob3o$bo2b5ob3ob2o
bob2obob2ob2o$2o10bob2ob4obo2b4o$3bobo4b2obob2o6bob2ob2o$ob2obo3bo2bo 3b3obo6b3o$b2o4bo3bo2b2o2b3o3bob2obobo$5o4b4obo2b4o9b2o$obo2bobob2ob2o
5b2ob2o2b3obo$2o2b2ob2ob3o3b2ob2o4bobob2o$3o2bo2b4ob2o3b2o2b2obo4b2o$o 3bo2b3o8bo4b4obo2bo$2ob4o6b2o2bob3o2b2ob2obo\$ob2o2b2o9b3obobob7o!


biggiemac
Posts: 504
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: apgsearch v1.0

4 trillion objects marker reached!
Physics: sophistication from simplicity.

shandis
Posts: 1
Joined: July 29th, 2015, 12:31 am
Contact:

Re: apgsearch v1.0

I have tow question :
How can I use apgsearch v1.0 ?
When releasing new version ?

Billabob
Posts: 146
Joined: April 2nd, 2015, 5:28 pm

Re: apgsearch v1.0

biggiemac [Three days ago!] wrote:4 trillion objects marker reached!
We're already at 4.444 trillion objects. Is it just me, or has the search sped up over the past few days?
▄▀
▀▀▀

Alexey_Nigin
Posts: 323
Joined: August 4th, 2014, 12:33 pm
Location: Ann Arbor, MI
Contact:

Re: apgsearch v1.0

Billabob wrote:
biggiemac [Three days ago!] wrote:4 trillion objects marker reached!
We're already at 4.444 trillion objects. Is it just me, or has the search sped up over the past few days?
The latter guess is correct:

http://conwaylife.com/forums/viewtopic.php?f=7&t=1784
There are 10 types of people in the world: those who understand binary and those who don't.

gameoflifeboy
Posts: 474
Joined: January 15th, 2015, 2:08 am

Re: apgsearch v1.0

I'm guessing this was unintentional...? There should probably be a check to see whether a yl... apgcode contains only alphanumeric characters and underscores.

The Turtle
Posts: 102
Joined: May 6th, 2015, 8:14 pm
Location: Chicago, Illinois

Re: apgsearch v1.0

Hello, I'm back! (after a long time)
Has anyone modified apgsearch so that it could run differently sized soups, like 4x4, 5x5, or 32x32? I am doing a project on data analysis in school, so I thought this might be a good place for data. (I'm posting it in this thread because I still am using the Python version of apgsearch)

Thanks,
The Turtle
Only two things are constant: change and the speed of light.

Billabob
Posts: 146
Joined: April 2nd, 2015, 5:28 pm

Re: apgsearch v1.0

The Turtle wrote:Hello, I'm back! (after a long time)
Has anyone modified apgsearch so that it could run differently sized soups, like 4x4, 5x5, or 32x32? I am doing a project on data analysis in school, so I thought this might be a good place for data. (I'm posting it in this thread because I still am using the Python version of apgsearch)

Thanks,
The Turtle
I vaguely remember someone posting a 5x5 census in the Useless Discoveries thread.

EDIT: Yep, it's on page 13.
▄▀
▀▀▀

biggiemac
Posts: 504
Joined: September 17th, 2014, 12:21 am
Location: California, USA

Re: apgsearch v1.0

It is possible to modify apgsearch to run soups of different initial sizes but you should probably change it so that results are only saved locally and not to the Catagolue website. The site only holds onto the hashstring the generates the soup, so if your soups are not 16x16 (or another function from 256 bits to a soup that Catagolue has already) then the site will convert the hashstring and get very confused.

A quick overkill hack to do this for some edge length less than or equal to 16 would be to add the following statement:

Code: Select all

            if (j > 2*(edge-1) or k > edge-1 or j % 2 and k > edge-9):
continue
(where edge is replaced by whatever edge size you want) after the line in hashsoup that reads

Code: Select all

for k in xrange(8):
Above side length 16 and you need more than 256 bits, so you'll need to get multiple hashes. But chopping the soup off at a point is easy, so any NxN less than or equal to 16 is fine.

Edit: debugged and verified. Now substituting the line above and changing "get_server_address()" at the top of the file to return localhost:8080 will generate a census table for a smaller edge size that you can view but which doesn't talk to catagolue.
Physics: sophistication from simplicity.

dvgrn
Moderator
Posts: 6735
Joined: May 17th, 2009, 11:00 pm
Contact:

Re: apgsearch v1.0

biggiemac wrote:Above side length 16 and you need more than 256 bits, so you'll need to get multiple hashes. But chopping the soup off at a point is easy, so any NxN less than or equal to 16 is fine.
Really it's just as easy to use a standard Catagolue seed to generate more than 256 bits.

Just as a possibly silly example, couldn't you just append together the SHA256, SHA224, SHA384, and SHA512 hashes of a Catagolue seed string? That would give you 1376 bits, enough for up to a 37x37 soup -- and I don't think there'd be any visible repetition or non-randomness in those bits:

Code: Select all

Test seed string:  blahblahblah
SHA256: 6e80a5bf1f6e165f65965076290a61638dfde0f2972474d73b954a10962a392f
SHA224: 4cc20ca29afa8185a677c35a57baca9ce6e4c22ccf2833295e43c741
SHA384: f75b56c547197d48cfd2600861f9e9d7095f65f34cfac04a87f5bc7551172ced8dff9e0802a4b5facd6c46ea950abc82
SHA512: c1d4e766c58df58bb274cb7b9c2874dc7770686daf98951d5933dde25c84a5fd866d806fced4ef12a15c69efc029975cdff1220c3fefc203a2fb003ed16f8640
If more randomness is needed, well, modify the seed a little bit before feeding it into SHA224, SHA384, and SHA512. Or use the output string as SHA256 input to get another 256 bits, or something like that.

A for awesome
Posts: 2047
Joined: September 13th, 2014, 5:36 pm
Location: 0x-1
Contact:

Re: apgsearch v1.0

Is there any way to engineer apgsearch to recognize replicators and XORships in HighLife? It seems like some kind of filter (like the ExpungeObjects rule) could be used to get rid of all replicators (only in HighLife searches, of course); that would probably result in a large increase in speed.
x₁=ηx
V ⃰_η=c²√(Λη)
K=(Λu²)/2
Pₐ=1−1/(∫^∞_t₀(p(t)ˡ⁽ᵗ⁾)dt)

$$x_1=\eta x$$
$$V^*_\eta=c^2\sqrt{\Lambda\eta}$$
$$K=\frac{\Lambda u^2}2$$
$$P_a=1-\frac1{\int^\infty_{t_0}p(t)^{l(t)}dt}$$

http://conwaylife.com/wiki/A_for_all

Aidan F. Pierce

calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

A for awesome wrote:Is there any way to engineer apgsearch to recognize replicators and XORships in HighLife?
It already recognises them as ZZ_REPLICATOR and ZZ_LINEAR, respectively.

https://catagolue.appspot.com/census/b36s23/C1/zz
What do you do with ill crystallographers? Take them to the mono-clinic!

A for awesome
Posts: 2047
Joined: September 13th, 2014, 5:36 pm
Location: 0x-1
Contact:

Re: apgsearch v1.0

calcyman wrote:
A for awesome wrote:Is there any way to engineer apgsearch to recognize replicators and XORships in HighLife?
It already recognises them as ZZ_REPLICATOR and ZZ_LINEAR, respectively.

https://catagolue.appspot.com/census/b36s23/C1/zz
I meant that it might be possible to recognize and handle them without the need for an error-correction phase.
x₁=ηx
V ⃰_η=c²√(Λη)
K=(Λu²)/2
Pₐ=1−1/(∫^∞_t₀(p(t)ˡ⁽ᵗ⁾)dt)

$$x_1=\eta x$$
$$V^*_\eta=c^2\sqrt{\Lambda\eta}$$
$$K=\frac{\Lambda u^2}2$$
$$P_a=1-\frac1{\int^\infty_{t_0}p(t)^{l(t)}dt}$$

http://conwaylife.com/wiki/A_for_all

Aidan F. Pierce

Alexey_Nigin
Posts: 323
Joined: August 4th, 2014, 12:33 pm
Location: Ann Arbor, MI
Contact:

Re: apgsearch v1.0

• How many highly-symmetric Day & Night soups can I upload per day so that everything is OK with Catagolue?
• Where on my computer does apgsearch v1.0 store its hauls?
• What is the smallest haul size accepted by Catagolue (I have found by trial and error that it's about 10000, but I would like to know the exact value)?
There are 10 types of people in the world: those who understand binary and those who don't.

calcyman
Posts: 2200
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v1.0

Alexey_Nigin wrote:
• How many highly-symmetric Day & Night soups can I upload per day so that everything is OK with Catagolue?
• Where on my computer does apgsearch v1.0 store its hauls?
• What is the smallest haul size accepted by Catagolue (I have found by trial and error that it's about 10000, but I would like to know the exact value)?
1. I have no idea. A haul can't exceed one megabyte in size, though.
2. In some subdirectory of Golly's 'data' directory. See Help >> Python Scripting >> getdir() for the (OS-dependent) details.
3. Yes, it's 10000 soups.
What do you do with ill crystallographers? Take them to the mono-clinic!