Idea: Distributed computing simulation

KittyTac · Post by **KittyTac** » April 20th, 2018, 9:48 pm

We all know how successful apgsearch is. Why not use the same method for normal life simulation? What if I wanted to simulate a 1000000x1000000 torus of B2e3ain4-cjqtw5ceiry6-ac78/S2ac3ajknq4eiqwy5-ain6-e78 because there would be interesting large-scale structures available. Then a rendering would be shown, in another program that applies the Golly "Smarter Scaling" option for larger scales than 2^4:1, and can be zoomed and moved. Is this feasible?

SuperSupermario24 · Post by **SuperSupermario24** » April 21st, 2018, 11:09 pm

See, the thing about apgsearch is, from my understanding, it's a mostly independent thing. Anyone can decide to run the program with whatever parameters they want and then upload the results to Catagolue. Sure, it's "distributed computing" in the sense that Catagolue is the result of a number of different people collaboratively searching for CA objects, but all those searches are done independently.

On the other hand, what you're suggesting is multiple people working to simulate the same patterns. Not only would this be difficult to synchronize in an efficient way, but there'd also have to be some sort of system to agree on which ones to simulate.

It might be able to work, but I can't see it working as well as something like apgsearch.

Hooloovoo · Post by **Hooloovoo** » April 23rd, 2018, 5:35 am

Before we can simulate a single small pattern in parallel over a high-latency low-bandwidth link like the internet, we would first need a way to efficiently do that locally on one computer.
Bigger patterns are a different story.

1000000 is around 2^20, so a 1000000x1000000 grid would take up around 2^40 bits. 2^40 bits is 2^37 bytes, which works out to 128G of memory. While that is a fairly large amount of memory, it's not an unreasonable amount; I have an 8 year old server that supports that much.
I'm going to use its specs in my calculations since I'm familiar with them, and it's a fairly normal (if a bit old) mid-high end server.

I'm going to additionally assume the simple method of simulating the universe: a large array of cells, with line buffering so only one copy needs to be stored. White/black space optimizations may be important but I don't know enough to say what if any gains could be made. We can pretty simply modify this algorithm to support multiple threads, by having each thread evolve one slice of the whole grid.

According to http://conwaylife.com/forums/viewtopic. ... 50&p=31850 you can simulate around 1 life cell per clock cycle per core on amd k12, which is around the same age as my CPU. While that doesn't give any information on how fast KittyTac's rule can be simulated, I'm going to guess it's going to be significantly slower, 10 cycles or more per cell. Since my cpu runs at 2.4GHz, I estimate that I could simulate 240M cells per core-second, but multiplying by the number of cores, 12, we are up to 2.88 Gcell/s. You could get 1 generation every 380 seconds.

To speed things up, you could indeed split the work between multiple computers; doing it over the internet would be hard because 128G is a lot of data to distribute to workers. Each worker could get 10k or 100k rows, and only evolve those. You would need to deal with the boundaries between the different sections, somehow, but it could be done for relatively low generation rates.

So, 1Mx1M is difficult, but doable, to simulate.

KittyTac · Post by **KittyTac** » April 25th, 2018, 9:37 pm

What about 100kx100k?

Post by **dvgrn** » April 26th, 2018, 10:52 am

KittyTac wrote:What about 100kx100k?

Divide Hooloovoo's estimates by 100, as a first guess -- one tick every four seconds. But to distribute the effort, no matter what the size is, you'd need unrealistically fast communication between the computers in the distributed network.

Let's say each computer gets a 4096-row strip of the full pattern, and furthermore adjacent strips overlap by 2048 cells, so each cell in the full pattern is represented on two CPUs.

After each strip is calculated 1024 ticks into the future, the entire resulting 2048xN central part of the strip will have to be communicated over the network. The computer handling the adjacent strip above will need to know about the upper 1024xN half of that 2048xN section, and the computer handling the adjacent strip below will need the other 1024xN chunk.

So in aggregate, if I'm thinking about this right, we're communicating over the network an amount of data equal to the size of the entire pattern, every 1024 ticks. If we make the strips taller we can wait proportionally longer before doing that communication -- but by doing that we're also making the simulation less distributable.

Ultimately the obvious way to avoid this over-communication problem is to make the strip as tall as the entire pattern. In other words, it probably makes more sense to limit this kind of distributed simulation to multiple cores all sharing the same memory -- that way there's no huge amount of data to communicate across the low-bandwidth Internet.

-- This is all basically talking about a distributed QuickLife-type simulation. There are some more possible shortcuts available for distributed HashLife, but here again, it's a difficult problem even just to synchronize multiple cores so that they can all access and modify the same hashtable without stepping on each other's toes.

With separate computers on a distributed network communicating via the Internet, it seems as if each computer would really have to have a complete copy of the hashtable for the entire pattern. If each computer only keeps a piece of the hashtable, you won't get HashLife's famous exponential speedup -- every lookup would need a round trip across the network.

But if a single computer can store the entire hashtable, it's not clear what work is left that can be usefully distributed -- it's probably faster to have one computer do the work, and dodge all the communication and synchronization issues... It's always possible that there's something clever I'm not thinking of, though!

There are certainly special-case patterns that could be successfully partitioned up to run well on a distributed network, but it seems to me those are just the obvious highly organized cases where different parts of the pattern don't interact with each other very much.

ConwayLife.com

Idea: Distributed computing simulation

Idea: Distributed computing simulation

Re: Idea: Distributed computing simulation

Re: Idea: Distributed computing simulation

Re: Idea: Distributed computing simulation

Re: Idea: Distributed computing simulation