ConwayLife.com - A community for Conway's Game of Life and related cellular automata
Home  •  LifeWiki  •  Forums  •  Download Golly

apgsearch v5.0

For general discussion about Conway's Game of Life.

Re: apgsearch v5.0

Postby muzik » April 2nd, 2019, 10:02 am

Can mesuthelahs also be classified in terms of the diversity of the output, in terms of how many different objects are recorded in the ash?
Bored of using the Moore neighbourhood for everything? Introducing the Range-2 von Neumann isotropic non-totalistic rulespace!
muzik
 
Posts: 3301
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: apgsearch v5.0

Postby testitemqlstudop » April 3rd, 2019, 5:23 am

Yes. Am submitting a PR....
buy bitcoin
User avatar
testitemqlstudop
 
Posts: 696
Joined: July 21st, 2016, 11:45 am
Location: very very very very boats

Re: apgsearch v5.0

Postby wildmyron » April 4th, 2019, 5:21 am

I thought I would make use of Google's cloud computing offer of US$300 credit for new users to test the performance of the new CUDA search on a range of GPUs. See below for the results of some tests I ran, mostly averaged over 100,000,000 soups and using sufficient CPU threads to ensure the search was purely GPU bound.

I was very surprised with the result from the Tesla K80. It's a slightly older GPU but it's no slouch. Within Google Compute Engine (GCE) it's not possible to attach this GPU to the newer Skylake Xeon processors, but I don't believe this explains the results. My result for the V100 is about 3% under calcyman's, perhaps due to the VM environment or other system architecture differences. Not sure. I've also provided the hourly cost estimates from GCE - in my local currency I believe (multiply by 0.7 for USD) - which shows that the V100 is the most economical platform within GCE, at a cost of $1.42 / billion soups

It seems that somebody else has started searching G1 in earnest today. From the haul submission rate my guess is that they are using a Tesla P100.

@calcyman: Any idea why the CUDA search always has one CPU running at 100%? This is independent of GPU used and in addition to the CPU search threads which are processing the interesting soups.

==========

Performance test of apgmera v5.063-ll2.2.7, GPU search of C1 symmetry using Google Compute Engine
VM image: Intel optimized Deep Learning Image: Base m22 (with Intel® MKL and CUDA 10.0)
gcc version 6.3.0 20170516
nvcc release 10.0, V10.0.130
NVIDIA driver version: 410.72
CUDA version 10.0

CPU platforms
apgluxe built with './recompile.sh --profile' separately for each CPU platform

1) Intel Xeon CPU @ 2.20GHz (Broadwell, AVX2)
CPU search speed: 5000 soups/s/core

2) Intel Xeon CPU @ 2.00GHz (Skylake, AVX-512)
CPU search speed: 8900 soups/s/core

GPU platforms
apgluxe built with './recompile.sh --cuda' separately for each CPU platform

1) n1-standard-2 (2 vCPUs, 7.5 GB memory), Intel Xeon CPU @ 2.20GHz (Broadwell), NVIDIA Tesla K80 GPU (12G VRAM) ($0.42/hr)
GPU search speed: 41300 soups/s (GPU utilisation 97%, -u 8192 -p 1)

2) n1-highcpu-4 (4 vCPUs, 3.6 GB memory), Intel Xeon CPU @ 2.00GHz (Skylake), NVIDIA Tesla P4 GPU (8G VRAM) ($0.58/hr)
GPU search speed: 82000 soups/s (GPU utilisation 99%, -u 8192 -p 2) (Could safely run this with -p 1 on 2 vCPU system)

3) n1-highcpu-4 (4 vCPUs, 3.6 GB memory), Intel Xeon CPU @ 2.00GHz (Skylake), NVIDIA Tesla T4 GPU (16G VRAM) ($0.85/hr)
GPU search speed: 115000 soups/s (GPU utilisation 98%, -u 8192 -p 2)

4) n1-highcpu-4 (4 vCPUs, 3.6 GB memory), Intel Xeon CPU @ 2.00GHz (Skylake), NVIDIA Tesla P100 GPU (16G VRAM) ($1.24/hr)
GPU search speed: 187000 soups/s (GPU utilisation 97%, -u 8192 -p 2)

5) custom (6 vCPUs, 5.5 GB memory), Intel Xeon CPU @ 2.00GHz (Skylake), NVIDIA Tesla V100 GPU (16G VRAM) ($1.92/hr)
GPU search speed: 376000 soups/s (GPU utilisation 92%, -u 8192 -p 4) (occasionally CPU bound with -p 3)
The latest version of the 5S Project contains over 47,000 spaceships. Tabulated pages up to period 160 are available on the LifeWiki.
wildmyron
 
Posts: 1108
Joined: August 9th, 2013, 12:45 am

Re: apgsearch v5.0

Postby Macbi » April 4th, 2019, 5:29 am

wildmyron wrote:which shows that the V100 is the most economical platform within GCE, at a cost of $1.42 / billion soups
Wow. Even with the impressive speed of GPUs it would still cost $25320.38 to overtake C1.
User avatar
Macbi
 
Posts: 659
Joined: March 29th, 2009, 4:58 am

Re: apgsearch v5.0

Postby calcyman » April 4th, 2019, 8:00 am

Wow -- thanks for the comprehensive analysis!

wildmyron wrote:@calcyman: Any idea why the CUDA search always has one CPU running at 100%? This is independent of GPU used and in addition to the CPU search threads which are processing the interesting soups.


Yes, that confused me as well: apparently, the CUDA drivers busy-wait (!!!) for the GPU whenever there's a dependency. https://devtalk.nvidia.com/default/topic/794833/100-cpu-usage-when-running-cuda-code/


wildmyron wrote:I was very surprised with the result from the Tesla K80. It's a slightly older GPU but it's no slouch. Within Google Compute Engine (GCE) it's not possible to attach this GPU to the newer Skylake Xeon processors, but I don't believe this explains the results.


Can you run watch -n 0.1 nvidia-smi whilst apgsearch is running? (It might have something to do with the fact that a K80 card is usually dual-GPU, whereas apgsearch currently only supports one GPU.)


1) Intel Xeon CPU @ 2.20GHz (Broadwell, AVX2)
CPU search speed: 5000 soups/s/core

2) Intel Xeon CPU @ 2.00GHz (Skylake, AVX-512)
CPU search speed: 8900 soups/s/core


Those look reasonable given the clock speeds. (By the way, heuristically apgsearch running with -p 4 on a 4-core machine is about 3.7 times faster than running -p 0, because a hyperthreaded CPU with two virtual cores isn't quite the same as having two physical cores.)

Macbi wrote:
wildmyron wrote:which shows that the V100 is the most economical platform within GCE, at a cost of $1.42 / billion soups
Wow. Even with the impressive speed of GPUs it would still cost $25320.38 to overtake C1.


Indeed. Of course, it's an underestimate as to the value of b3s23/C1, since the GPUs are currently slightly cheating by not producing a full census of all objects produced. It would be interesting to see how much they slow down after that code has been added. (For CPUs, this stage consumes about 15% of time on AVX-512, or 10% on AVX2.)

It's nontrivial to even determine how to implement the object detection on the GPU. I was thinking that it might be best placed in the schedule() kernel, activated whenever the soup is deemed *uninteresting* (as the interesting soups are already fully censused by the CPU).
What do you do with ill crystallographers? Take them to the mono-clinic!
User avatar
calcyman
 
Posts: 2019
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v5.0

Postby muzik » April 4th, 2019, 8:05 am

Just one more mesuthelah suggestion: how about recording ones with sufficiently large bounding boxes or bounding diamonds, assuming all spaceships are deleted, and excluding soups with yl?

messiest_
Bored of using the Moore neighbourhood for everything? Introducing the Range-2 von Neumann isotropic non-totalistic rulespace!
muzik
 
Posts: 3301
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: apgsearch v5.0

Postby wildmyron » April 4th, 2019, 10:53 am

calcyman wrote:
wildmyron wrote:@calcyman: Any idea why the CUDA search always has one CPU running at 100%? This is independent of GPU used and in addition to the CPU search threads which are processing the interesting soups.

Yes, that confused me as well: apparently, the CUDA drivers busy-wait (!!!) for the GPU whenever there's a dependency.

Well, that is bizarre.

calcyman wrote:
wildmyron wrote:I was very surprised with the result from the Tesla K80. It's a slightly older GPU but it's no slouch. Within Google Compute Engine (GCE) it's not possible to attach this GPU to the newer Skylake Xeon processors, but I don't believe this explains the results.

Can you run watch -n 0.1 nvidia-smi whilst apgsearch is running? (It might have something to do with the fact that a K80 card is usually dual-GPU, whereas apgsearch currently only supports one GPU.)

I can't spin up another VM with a GPU at the moment as there's a global quota which I only requested be raised to 1 to allow the tests to be carried out - and I'm currently using that GPU to stay ahead of Alex Greason in the G1 census :)

But I did run nvidia-smi intermittently while testing the K80 - that's where the 97% utilisation figure comes from. GCE splits the two K80 cores on a single die to be allocated as separate resources, i.e. the machine provisioned for this test only had access to one of the two GPUs on the K80 card. I was at first confused about the number of CUDA cores quoted in this table, but I've verified that it is the number of CUDA cores per GPU.

calcyman wrote:(By the way, heuristically apgsearch running with -p 4 on a 4-core machine is about 3.7 times faster than running -p 0, because a hyperthreaded CPU with two virtual cores isn't quite the same as having two physical cores.)

I presume that when I provision a 4 vCPU system in GCE that what I actually get is 4 hyperthreaded CPUs on two physical cores, however the true situation is rather opaque to me and I don't think it's something GCE customers have any control over.

On another note, I've noticed that the D4_+2 search runs much slower on these AVX-512 CPUs - even when compiled with --profile. I see around 4000 soups/s, which is less than half the C1 soup rate and slower than the desktop AVX2 CPU I have which only reaches around 7000 soups/s for C1. (Sorry, I can't remember the D4_+2 soup speed for direct comparison, but I think it was around 5000).

Edit: Mystery solved - just tested this on the AVX-512 CPUs and it's due to hyperthreading. When only one CPU on the core is busy the search speed jumps from 4100 to 5600 soups/s. A much larger difference than that noted for C1 above
The latest version of the 5S Project contains over 47,000 spaceships. Tabulated pages up to period 160 are available on the LifeWiki.
wildmyron
 
Posts: 1108
Joined: August 9th, 2013, 12:45 am

Re: apgsearch v5.0

Postby calcyman » April 5th, 2019, 12:04 pm

I wrote:It's nontrivial to even determine how to implement the object detection on the GPU. I was thinking that it might be best placed in the schedule() kernel, activated whenever the soup is deemed *uninteresting* (as the interesting soups are already fully censused by the CPU).


Checking for 2-periodicity at the end causes the speed to drop from 385k to 300k soups/second, and doesn't increase the number of interesting soups greatly (essentially, it just means the CPU handles soups containing pulsars). I think loop unrolling might be able to decrease the gap between 385k and 300k.

EDIT: Loop unrolling didn't quite work, but moving the functionality into a separate kernel with more threads per block (768 instead of 128) improves this to 350k.
What do you do with ill crystallographers? Take them to the mono-clinic!
User avatar
calcyman
 
Posts: 2019
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v5.0

Postby muzik » April 9th, 2019, 3:43 pm

Would it be possible to somehow census and record certain constellations of objects, such as the more obscure familiar fours?
Bored of using the Moore neighbourhood for everything? Introducing the Range-2 von Neumann isotropic non-totalistic rulespace!
muzik
 
Posts: 3301
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: apgsearch v5.0

Postby muzik » April 26th, 2019, 3:10 pm

testitemqlstudop wrote:Yes. Am submitting a PR....


What's the current status on it?
Bored of using the Moore neighbourhood for everything? Introducing the Range-2 von Neumann isotropic non-totalistic rulespace!
muzik
 
Posts: 3301
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: apgsearch v5.0

Postby Ch91 » May 5th, 2019, 1:46 am

I tried running the pre-compiled version of this (as linked to from https://gitlab.com/apgoucher/apgmera) on my computer, and despite the soup search apparently proceeding normally it doesn't appear that my haul results were ever uploaded to Catagolue. Could I be doing something wrong with it?
Ch91
 
Posts: 15
Joined: April 26th, 2019, 8:05 pm

Re: apgsearch v5.0

Postby Ian07 » May 5th, 2019, 7:20 am

Ch91 wrote:I tried running the pre-compiled version of this (as linked to from https://gitlab.com/apgoucher/apgmera) on my computer, and despite the soup search apparently proceeding normally it doesn't appear that my haul results were ever uploaded to Catagolue. Could I be doing something wrong with it?

What's your Catagolue username?
Ian07
 
Posts: 206
Joined: September 22nd, 2018, 8:48 am

Re: apgsearch v5.0

Postby Ch91 » May 5th, 2019, 11:24 am

Ian07 wrote:
Ch91 wrote:I tried running the pre-compiled version of this (as linked to from https://gitlab.com/apgoucher/apgmera) on my computer, and despite the soup search apparently proceeding normally it doesn't appear that my haul results were ever uploaded to Catagolue. Could I be doing something wrong with it?

What's your Catagolue username?

Ch91. I tried looking at my user page, but it said I never uploaded any soups despite the precompiled program running without any discernible issues.
Ch91
 
Posts: 15
Joined: April 26th, 2019, 8:05 pm

Re: apgsearch v5.0

Postby dvgrn » May 5th, 2019, 11:44 am

Ch91 wrote:Ch91. I tried looking at my user page, but it said I never uploaded any soups despite the precompiled program running without any discernible issues.

How did you invoke apgluxe on the command line -- did you include a -k parameter?

For example, one of my recent invocations was

./apgluxe --rule b3s12-ae34ceit -n 10000 -k {payosha256key}
User avatar
dvgrn
Moderator
 
Posts: 5615
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI

Re: apgsearch v5.0

Postby Ch91 » May 5th, 2019, 11:59 am

dvgrn wrote:
Ch91 wrote:Ch91. I tried looking at my user page, but it said I never uploaded any soups despite the precompiled program running without any discernible issues.

How did you invoke apgluxe on the command line -- did you include a -k parameter?

For example, one of my recent invocations was

./apgluxe --rule b3s12-ae34ceit -n 10000 -k {payosha256key}

I used the pre-compiled version linked from Adam Goucher's website, which for your convenience is linked here (https://catagolue.appspot.com/binaries/apgluxe-windows-x86_64.exe), so technically it was never invoked on the command line. I recall entering the suggested values for the haul size and threads used when it initialized.

For what it was worth, I used the initial values the program selected and made sure to enter my payosha256 key. In retrospect I notice that my key is in fact ch91 but entered it into the program as Ch91; could that have been a factor?

Edit: Entering my key with the right case appears to have fixed it. I'll have to remember that the payosha256 key entry is case sensitive next time.
Ch91
 
Posts: 15
Joined: April 26th, 2019, 8:05 pm

Re: apgsearch v5.0

Postby testitemqlstudop » May 7th, 2019, 4:56 am

I know I was not-around for a while.

A word of notice: Upgrading gcc from 8.3 to 9.1 gives (sometimes) a nominal improvement of 100 to 200 h/s.
For most operating systems other than Ubuntu 19.04 you need to compile gcc yourself, which also contributes to the improvement.

apgluxe w/ gcc 8.3, 23.3 kh/s avg, ending 21.4 kh/s
Greetings, this is apgluxe v5.08-ll2.2.12, configured for b3s23/C1.

Lifelib version: ll2.2.12
Compiler version: 8.3.0
Python version: '3.7.3rc1 (default, Mar 13 2019, 11:01:15)  [GCC 8.3.0]'

Using seed k_bVZPgBTkBhku
Instruction set AVX2 detected
Running 1000000 soups per haul:
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 300000 soups completed (26766.595 soups/second current, 26766.595 overall).
Rare oscillator detected: xp4_37bkic
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
b3s23/C1: 600000 soups completed (22977.941 soups/second current, 24726.973 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 900000 soups completed (21508.460 soups/second current, 23551.578 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 1000000 soups completed (21422.451 soups/second current, 23319.264 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
Payosha256 authentication succeeded.
***********************************************



apgluxe w/ gcc 9.1 self-compiled, 24 kh/s avg, 21.5 kh/s ending
Greetings, this is apgluxe v5.08-ll2.2.12, configured for b3s23/C1.

Lifelib version: ll2.2.12
Compiler version: 9.1.0
Python version: '3.7.3rc1 (default, Mar 13 2019, 11:01:15)  [GCC 8.3.0]'

Using seed k_qgxx6uiC9FR4
Instruction set AVX2 detected
Running 1000000 soups per haul:
Rare oscillator detected: xp3_695qc8zx33
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Rare oscillator detected: xp8_gk2gb3z11
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 300000 soups completed (26343.519 soups/second current, 26343.519 overall).
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
Rare oscillator detected: xp4_37bkic
Rare oscillator detected: xp4_37bkic
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 600000 soups completed (26251.313 soups/second current, 26297.335 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Rare oscillator detected: xp8_gk2gb3z11
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl384_1_59_7aeb1999980c43b4945fb7fcdb023326
b3s23/C1: 900000 soups completed (21373.611 soups/second current, 24421.350 overall).
b3s23/C1: 1000000 soups completed (21547.080 soups/second current, 24099.870 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
Payosha256 authentication succeeded.
***********************************************                                                                                           
                                                                                                                                           
Connection was successful.
New seed: k_jCLsTvFGMMaK; iterations = 1; quitByUser = 0
Terminating...
buy bitcoin
User avatar
testitemqlstudop
 
Posts: 696
Joined: July 21st, 2016, 11:45 am
Location: very very very very boats

Re: apgsearch v5.0

Postby testitemqlstudop » May 18th, 2019, 6:56 am

Having acquired a GTX 750 (a very crappy GPU), I found that it can soup-search G1 at roughly the same speed as 4 Intel i3 cores (no hyperthreading), so it's still very impressive.

Loops: 7374
Tiles in epoch: 420026281
Interesting universes: 2680 out of 1000000
b3s23/G1: 1000000 soups completed (18827.073 soups/second current, 18827.073 overall).
buy bitcoin
User avatar
testitemqlstudop
 
Posts: 696
Joined: July 21st, 2016, 11:45 am
Location: very very very very boats

Re: apgsearch v5.0

Postby alphazelf » May 28th, 2019, 3:43 am

I found a bug with apgsearch 5.08 and it is killing me right now. With version 4.984 you could submit hauls up to a massive 10 billion soups. A select few people did that. In version 5.08 when a haul is completed and has over like 2 billion objects, it just shows this error:

Line length exceeds upper limit of 2000 bytes.


I really want this fixed while also being able to start hauls larger than 10 billion objects
alphazelf
 
Posts: 7
Joined: January 8th, 2019, 12:57 pm

Re: apgsearch v5.0

Postby calcyman » May 28th, 2019, 5:38 am

alphazelf wrote:I found a bug with apgsearch 5.08 and it is killing me right now. With version 4.984 you could submit hauls up to a massive 10 billion soups. A select few people did that. In version 5.08 when a haul is completed and has over like 2 billion objects, it just shows this error:

Line length exceeds upper limit of 2000 bytes.


I really want this fixed while also being able to start hauls larger than 10 billion objects


Fixed in version 5.09. (By the way, the bug you reported only happens when single-core searching.)
What do you do with ill crystallographers? Take them to the mono-clinic!
User avatar
calcyman
 
Posts: 2019
Joined: June 1st, 2009, 4:32 pm

Re: apgsearch v5.0

Postby testitemqlstudop » May 29th, 2019, 2:18 am

A select few people did that.


yeah me i did that three times
buy bitcoin
User avatar
testitemqlstudop
 
Posts: 696
Joined: July 21st, 2016, 11:45 am
Location: very very very very boats

Previous

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest