Catagolue Discussion Thread

For general discussion about Conway's Game of Life.
Post Reply
User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 1st, 2019, 8:11 pm

testitemqlstudop wrote:Well, I lowered MIN_LOG_DIFFICULTY ("#define MIN_LOG_DIFFICULTY 0x407a400000000000ull") to 0x407a40000000000ull (one less zero)
Image

You've changed the difficulty from 420 (CPU-minutes, which is reasonable) to 3.03 * 10^-289 (which is practically indistinguishable from zero):

https://en.wikipedia.org/wiki/Double-pr ... :_binary64

Here's Python code for experimenting:

Code: Select all

$ python
Python 2.7.15+ (default, Aug 31 2018, 11:56:52) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.unpack('<d', struct.pack('<Q', 0x407a400000000000))
(420.0,)
>>> struct.unpack('<d', struct.pack('<Q', 0x407a40000000000))
(3.032306728693576e-289,)
>>>
and the difficulty would not rise exponentially, but instead linearly
That's the log difficulty (effectively*) which rises linearly; the actual difficulty rises exponentially. But even so, it takes a long time to crawl back up from 3.03 * 10^-289 to 420.

* I'm using the 'computer logarithm' of reinterpret_casting between an IEEE double and a uint64. It's piecewise linear when you look at it locally, but behaves like a true logarithm at long scales. The reason for this as opposed to (say) natural log is that it loses no information (as well as taking 0 machine instructions!).
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 1st, 2019, 9:39 pm

GaAaAaH!
I thought it ended in "ull" - unsigned long long (64-bit unsigned integer), not that it was a double-precision representation! I changed it to 0x4010cccccccccccdull. So that's what the memcpy() is doing in get_difficulty...
Yes, I am stupid.
Thanks for pointing this out!

EDIT: What? There's the exact same behavio(u)r with 4.2 CPU-minutes! I'm changing it to 42...

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 1st, 2019, 10:47 pm

testitemqlstudop wrote:GaAaAaH!
I thought it ended in "ull" - unsigned long long (64-bit unsigned integer), not that it was a double-precision representation!
It's effectively both, depending on whether you're interested in the logarithm of the difficulty or the difficulty itself. You can think of the situation as:

log_difficulty ~= 2**52 * log2(difficulty) + 0x3ff0 0000 0000 0000ull

where ~= means 'approximately equal to'. The trick is that log_difficulty (which is a uint64) and difficulty (which is an IEEE 754 double) have identical binary representations.
I changed it to 0x4010cccccccccccdull
4.2 is certainly more sensible than 3.03 * 10^-289. I guess that will begin at roughly 1 haul every 4 minutes (or 1 minute if you're running on four cores) and adjust itself gradually to 1 haul every 10 minutes.
So that's what the memcpy() is doing in get_difficulty...
Precisely! The 'old-fashioned method' is to reinterpret_cast (or C-style cast) between pointers:

https://en.wikipedia.org/wiki/Fast_inverse_square_root

but type-punning is considered dangerous (because some optimising compilers assume differently-typed objects necessarily live in disjoint memory locations). So the 'safe method' is to put a memcpy in there. In many cases the resulting machine code is identical, because the compiler optimises out the call to memcpy -- but I'm not worried in this particular case because get_difficulty() isn't one of the performance bottlenecks of the program.

Satoshi used a similar floating-point system for the difficulty target in Bitcoin to achieve a high dynamic range, but he used his own convention instead of IEEE 754.
Yes, I am stupid.
Definitely not! You're far more knowledgeable than I was at your age.

It's difficult* to understand other people's code in general, especially a project of this size (lifelib is over 11000 lines of C++ and 3500 lines of Python, and then apgsearch is another 2000 lines of code on top of that). Usually I try to make an effort to document things adequately, but typically only on the polished 'master' branch rather than the experimental work branches.

* Aside: another stratum of difficulty above that would be to understand apgsearch without the source code, just using the 723kB compiled executable. And if you think that's daunting, imagine how difficult genetic engineering must be: the diploid human genome is 6.6 billion base-pairs, i.e. 1.65 GB of information, which is about 2000 times larger still!
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 12:02 pm

As an update, I've made the peer-review system avoid people verifying their own hauls. (Obviously this can be bypassed by someone having two payosha256 keys with different displayed names, but it adds another line of defence and, in particular, stops Anonymous from verifying Anonymous.)

Specifically, the system (for 'protected' censuses, i.e. standard-symmetry isotropic 2-state Moore-neighbourhood rules with at least 10^12 objects) is:
  • Choose a haul randomly out of the 100 most recent hauls. If that has the same displayed name as the verifier, then choose another one randomly up to 5 times.
  • If this process still fails to find a haul submitted by a different user, then return the haul if the backlog contains >= 90 hauls, or otherwise inform the verifier that there are no more hauls to process.
The latter condition means that it's impossible to self-verify when there are few hauls in the backlog, but stops the backlog for a single-user census from piling up indefinitely.
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 7:59 pm

calcyman wrote:The latter condition means that it's impossible to self-verify when there are few hauls in the backlog, but stops the backlog for a single-user census from piling up indefinitely.
How could a single-user census attain 10^12 objects, anyways?
(btw did you see my apgmera merge request)

EDIT: census
Last edited by testitemqlstudop on February 2nd, 2019, 8:21 pm, edited 1 time in total.

User avatar
77topaz
Posts: 1496
Joined: January 12th, 2018, 9:19 pm

Re: Catagolue Discussion Thread

Post by 77topaz » February 2nd, 2019, 8:15 pm

testitemqlstudop wrote:How could a single-user haul attain 10^12 objects, anyways?
(btw did you see my apgmera merge request)
He said single-user census. That is, a census which contains at least 10^12 objects, but at the time examined in this hypothetical scenario has only one user adding hauls to it (how many users contributed to the census in the past is irrelevant).

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 8:22 pm

77topaz wrote:
testitemqlstudop wrote:How could a single-user haul attain 10^12 objects, anyways?
(btw did you see my apgmera merge request)
He said single-user census. That is, a census which contains at least 10^12 objects, but at the time examined in this hypothetical scenario has only one user adding hauls to it (how many users contributed to the census in the past is irrelevant).
Ah, like 1x256?

User avatar
77topaz
Posts: 1496
Joined: January 12th, 2018, 9:19 pm

Re: Catagolue Discussion Thread

Post by 77topaz » February 2nd, 2019, 8:37 pm

testitemqlstudop wrote:Ah, like 1x256?
Yeah, the b3s23 higher symmetries that (largely) only carybe contributes to would be the main targets of that exception.

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 8:59 pm

testitemqlstudop wrote:(btw did you see my apgmera merge request)
Wow! I'll have to test this, but -- wow -- a 10% improvement is *massive*. Do you know which of the compiler flags are responsible for the speed boost?

This is, of course, *very* exciting! Thank you!

EDIT: Can you determine why there was a build failure? See here for console output: https://gitlab.com/testitem/apgmera/-/jobs/155371934

(It might be either an unsafe optimisation or the fact that the stdin symmetry is reading from stdin.)
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 9:26 pm

calcyman wrote:
testitemqlstudop wrote:(btw did you see my apgmera merge request)
Wow! I'll have to test this, but -- wow -- a 10% improvement is *massive*. Do you know which of the compiler flags are responsible for the speed boost?

This is, of course, *very* exciting! Thank you!

EDIT: Can you determine why there was a build failure? See here for console output: https://gitlab.com/testitem/apgmera/-/jobs/155371934

(It might be either an unsafe optimisation or the fact that the stdin symmetry is reading from stdin.)
Most of it is from -Ofast and -flto, actually (I tested). However, the compile-time profiling may squeeze in an itsy-bitsy amount more of system-specific optimizations OR actually give an overwhelming (I heard 20% on stack overflow) benefit. It does several important things too, like (apparently) generating probabilities for branch prediction and counting how many times a loop is run.

I'm not sure about the build failure; C1 compiles and runs perfectly for me. Maybe it's one of the flags?

By the way, here are my stats (above is current, below is mine):

Code: Select all

dandan@Lenovo-H3060:~/Documents/apgmera$ ./apgluxe -p 0

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct  2 2018, 22:12:08)  [GCC 8.2.0]'

Peer-reviewing hauls:

No more hauls to verify.
No more hauls to verify.
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed l_23bQMw3bXuuZ
Instruction set AVX2 detected
Running 10000000 soups per haul:
b3s23/C1: 44277 soups completed (4427.131 soups/second current, 4427.131 overall).
b3s23/C1: 88809 soups completed (4453.199 soups/second current, 4440.159 overall).
b3s23/C1: 131891 soups completed (4308.173 soups/second current, 4396.163 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 175076 soups completed (4318.452 soups/second current, 4376.730 overall).

Code: Select all


dandan@Lenovo-H3060:~/Coding/apgmera$ ./apgluxe -p 0

Greetings, this is apgluxe v4.86-ll2.1.9, configured for b3s23/C1.

Lifelib version: ll2.1.9
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct  2 2018, 22:12:08)  [GCC 8.2.0]'

Peer-reviewing hauls:

No more hauls to verify.
No more hauls to verify.
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed l_67BJ5uHP6mkj
Instruction set AVX2 detected
Running 10000000 soups per haul:
b3s23/C1: 45745 soups completed (4574.255 soups/second current, 4574.255 overall).
b3s23/C1: 92406 soups completed (4665.136 soups/second current, 4619.694 overall).
b3s23/C1: 137964 soups completed (4555.680 soups/second current, 4598.355 overall).
b3s23/C1: 186069 soups completed (4810.281 soups/second current, 4651.333 overall).
I actually thought about keeping it for myself, but then I was like "nyehhh, what could this crappy 2-core (2 physical cores + hyperthreading) do anyways, I might as well tell calcyman" lol :lol:
Last edited by testitemqlstudop on February 2nd, 2019, 9:31 pm, edited 1 time in total.

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 9:29 pm

This was gitlab_stdin_test that failed, not C1. But now I realise that's probably incompatibility between the '-p' flag and the stdin symmetry.

Also, can you use -s SpeedTest or some other deterministic seed when comparing? That gives a better comparison since it guarantees the same soups are run.
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 9:39 pm

calcyman wrote:This was gitlab_stdin_test that failed, not C1. But now I realise that's probably incompatibility between the '-p' flag and the stdin symmetry.

Also, can you use -s SpeedTest or some other deterministic seed when comparing? That gives a better comparison since it guarantees the same soups are run.
Alright, let me try again

Anyways, I pushed a cursory fix to my rep. (I think you might need to change the CI, now if you use stdin, you will need to type in RLEs during compile-time.)

EDIT:
The difference is 295.333 soups/second.

original (max 4451.008 soups/second)

Code: Select all

dandan@Lenovo-H3060:~/Documents/apgmera$ ./apgluxe -p 0 -s speed -n 100000

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11 
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct2 2018, 22:12:08)[GCC 8.2.0]'

Peer-reviewing hauls:

No more hauls to verify.
No more hauls to verify.
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed speed
Instruction set AVX2 detected
Running 100000 soups per haul:
b3s23/C1: 42255 soups completed (4225.385 soups/second current, 4225.385 overall). 
b3s23/C1: 86766 soups completed (4451.008 soups/second current, 4338.190 overall). 
b3s23/C1: 100000 soups completed (4298.174 soups/second current, 4332.848 overall).
---------------------------------------------------------------------- 
100000 soups completed.
Attempting to contact payosha256.
Payosha256 authentication succeeded.                                                                                                                                               
***********************************************
mine (max 4746.341 soups/second)

Code: Select all

dandan@Lenovo-H3060:~/Coding/apgmera$ ./apgluxe -p 0 -s speed -n 100000

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct  2 2018, 22:12:08)  [GCC 8.2.0]'

Peer-reviewing hauls:

No more hauls to verify.
No more hauls to verify.
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed speed
Instruction set AVX2 detected
Running 100000 soups per haul:
b3s23/C1: 46406 soups completed (4640.477 soups/second current, 4640.477 overall).
b3s23/C1: 93870 soups completed (4746.341 soups/second current, 4693.400 overall).
b3s23/C1: 100000 soups completed (4607.058 soups/second current, 4688.011 overall).
----------------------------------------------------------------------
100000 soups completed.
Attempting to contact payosha256.
Payosha256 authentication succeeded.
Haul already exists.                                                                                                                                                               
***********************************************

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 10:02 pm

I can confirm there's a substantial improvement: adding your -Ofast -flto -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers increases the speed from 4048 to 4192 soups per second.

Can you see how much the profile-based optimisation adds above that?

Also, please can you revert your stdin change (the one that sets -p to be zero) and then merge the latest apgmera commit (which makes stdin behave like C1 when -p is nonzero) into your repo? Thanks!
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 10:23 pm

calcyman wrote:I can confirm there's a substantial improvement: adding your -Ofast -flto -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers increases the speed from 4048 to 4192 soups per second.

Can you see how much the profile-based optimisation adds above that?

Also, please can you revert your stdin change (the one that sets -p to be zero) and then merge the latest apgmera commit (which makes stdin behave like C1 when -p is nonzero) into your repo? Thanks!
I included the profile-based optimizations in my last post!

Surprisingly, on my computer, just the flags, no profiling, doesn't do much (every flag):

Code: Select all

dandan@Lenovo-H3060:~/Documents/apgmera$ ./apgluxe -n 100000 -s speed -p 0

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct  2 2018, 22:12:08)  [GCC 8.2.0]'

Peer-reviewing hauls:

Instruction set AVX2 detected
Rare oscillator detected: xp30_w33z8kqrqk8zzzx33
Rare oscillator detected: xp3_695qc8zx33
[***********************************************]
Rare oscillator detected: xp4_ssj3744zw3
Rare oscillator detected: xp4_ssj3744zw3
Rare oscillator detected: xp4_ssj3744zw3
Rare oscillator detected: xp3_695qc8zx33
Rare oscillator detected: xp3_695qc8zx33
Rare oscillator detected: xp6_ccb7w66z066
Rare oscillator detected: xp8_g3jgz1ut
[***********************************************]
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed speed
Running 100000 soups per haul:
b3s23/C1: 43079 soups completed (4307.072 soups/second current, 4307.072 overall).
b3s23/C1: 87699 soups completed (4461.997 soups/second current, 4384.521 overall).
b3s23/C1: 100000 soups completed (4223.437 soups/second current, 4364.040 overall).
----------------------------------------------------------------------
100000 soups completed.
Attempting to contact payosha256.
Payosha256 authentication succeeded.
Haul already exists.                                                                                                                                                               
***********************************************
Even more surprisingly, just -Ofast and -flto actually make it slower:

Code: Select all

dandan@Lenovo-H3060:~/Documents/apgmera$ ./apgluxe -p 0 -s speed -n 100000

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11
Compiler version: 8.2.0
Python version: '2.7.15+ (default, Oct  2 2018, 22:12:08)  [GCC 8.2.0]'

Peer-reviewing hauls:

No more hauls to verify.
No more hauls to verify.
No more hauls to verify.

Peer-review complete; proceeding search.

Using seed speed
Instruction set AVX2 detected
Running 100000 soups per haul:
b3s23/C1: 41519 soups completed (4151.760 soups/second current, 4151.760 overall).
b3s23/C1: 85226 soups completed (4370.293 soups/second current, 4261.024 overall).
b3s23/C1: 100000 soups completed (4242.484 soups/second current, 4258.271 overall).
----------------------------------------------------------------------
100000 soups completed.
Attempting to contact payosha256.
It might just be a system thing; I don't really know. What about your computer?

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 10:50 pm

The build's still not passing .... ?

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 11:10 pm

testitemqlstudop wrote:The build's still not passing .... ?
How did you do the merge from my master? You should have lifelib c22a0818 instead of b73e7095 as the submodule.

(To repair, go into the lifelib subdirectory, checkout 'master' and git-pull to update it (to c22a0818), then go back into the parent (apgmera) directory and git-add lifelib.)
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 11:16 pm

calcyman wrote:
testitemqlstudop wrote:The build's still not passing .... ?
How did you do the merge from my master? You should have lifelib c22a0818 instead of b73e7095 as the submodule.

(To repair, go into the lifelib subdirectory, checkout 'master' and git-pull to update it (to c22a0818), then go back into the parent (apgmera) directory and git-add lifelib.)
Yeah, I have that.
EDIT: I have to go to bed now (I'm in New York), if you have further questions you can email me at "w tnm mo (at) g ma i l (do t) co m", no spaces (just to ward off scraper bots).
Last edited by testitemqlstudop on February 2nd, 2019, 11:17 pm, edited 1 time in total.

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 2nd, 2019, 11:17 pm

testitemqlstudop wrote:
calcyman wrote:
testitemqlstudop wrote:The build's still not passing .... ?
How did you do the merge from my master? You should have lifelib c22a0818 instead of b73e7095 as the submodule.

(To repair, go into the lifelib subdirectory, checkout 'master' and git-pull to update it (to c22a0818), then go back into the parent (apgmera) directory and git-add lifelib.)
Yeah, I have that.
Your repository says b73e7095: https://gitlab.com/testitem/apgmera
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
testitemqlstudop
Posts: 1367
Joined: July 21st, 2016, 11:45 am
Location: in catagolue
Contact:

Re: Catagolue Discussion Thread

Post by testitemqlstudop » February 2nd, 2019, 11:53 pm

whoops, last-minute updating
:(

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 3rd, 2019, 12:33 am

testitemqlstudop wrote:whoops, last-minute updating
:(
I've merged your changes now anyway -- your profiling approach improves it to 4320 on my AVX2 machine! :)

Because the profiling is pretty intrusive, and only suitable for rules where 100000 soups finish relatively quickly, I've made it optional. So to use your profiling acceleration, the invocation is:

./recompile.sh --profile

I've also changed the extension from .O to .op for the profiled objects, because some people (mainly Windows users) have case-insensitive filesystems that would object to having .o and .O coexisting.

Thank you very much for these updates! How would you like to be credited in the README?

EDIT: On my AVX-512 machine, it's even better still: soups per second have increased from 6140 to 6584, and the total number of instructions per soup has plummeted from 1080K to 966K:

Code: Select all

$ perf stat -B ./apgluxe -n 1000000 -t 1 -v 0 -s test

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b3s23/C1.

Lifelib version: ll2.1.11
Compiler version: 7.3.0
Python version: '2.7.15rc1 (default, Nov 12 2018, 14:31:15)  [GCC 7.3.0]'

Using seed test
Running 1000000 soups per haul:
Instruction set AVX-512 detected
b3s23/C1: 64401 soups completed (6440.060 soups/second current, 6440.060 overall).
b3s23/C1: 130164 soups completed (6576.199 soups/second current, 6508.121 overall).
b3s23/C1: 196853 soups completed (6667.964 soups/second current, 6561.404 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Soup test216747 lasts an estimated 740 generations; rerunning...
Soup test216747 actually lasts 637 generations.
b3s23/C1: 261511 soups completed (6465.779 soups/second current, 6537.497 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 327370 soups completed (6585.898 soups/second current, 6547.175 overall).
b3s23/C1: 393115 soups completed (6573.912 soups/second current, 6551.630 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 459748 soups completed (6663.297 soups/second current, 6567.580 overall).
b3s23/C1: 525636 soups completed (6588.743 soups/second current, 6570.224 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 591645 soups completed (6600.703 soups/second current, 6573.610 overall).
Rare oscillator detected: xp8_gk2gb3z11
b3s23/C1: 656918 soups completed (6527.051 soups/second current, 6568.953 overall).
b3s23/C1: 723084 soups completed (6616.541 soups/second current, 6573.278 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 788364 soups completed (6527.883 soups/second current, 6569.495 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 855046 soups completed (6667.639 soups/second current, 6577.044 overall).
b3s23/C1: 921159 soups completed (6610.982 soups/second current, 6579.468 overall).
b3s23/C1: 987972 soups completed (6681.267 soups/second current, 6586.253 overall).
b3s23/C1: 1000000 soups completed (6409.840 soups/second current, 6584.073 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
testing mode
testing
Connection was successful; starting new search...
----------------------------------------------------------------------
New seed: l_qXruggaVhGbN; iterations = 1; quitByUser = 0
Terminating...

 Performance counter stats for './apgluxe -n 1000000 -t 1 -v 0 -s test':

     151884.435307      task-clock (msec)         #    0.999 CPUs utilized          
               137      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
           121,755      page-faults               #    0.802 K/sec                  
   525,813,323,844      cycles                    #    3.462 GHz                    
   966,647,102,509      instructions              #    1.84  insn per cycle         
    92,968,439,128      branches                  #  612.100 M/sec                  
     6,130,671,887      branch-misses             #    6.59% of all branches        

     151.971671331 seconds time elapsed
(Without the overhead of perf stat, it's 6592 soups per second.)

The perf report is interesting: the two most expensive functions are runkgens (where it spends 65% of the time) and censusSoup (where it spends 11% of the time). Digging deeper, the next easy performance target seems to be the loop in upattern::totalPopulation() which gets inlined into censusSoup. If I restrict that to only iterate over tiles that have changed, it will be far more cache-friendly and we'll pick up another 3% performance improvement (i.e. 6800 soups/second after the change).
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
77topaz
Posts: 1496
Joined: January 12th, 2018, 9:19 pm

Re: Catagolue Discussion Thread

Post by 77topaz » February 3rd, 2019, 4:25 am

Am I understanding it correctly that the "--update" command works only when the apgluxe version number is updated, and not when only the lifelib version number is? Because it will not let me update from v4.86-ll2.1.9 to v4.86-ll2.1.11, saying that my copy of apgluxe is already up to date.

User avatar
Apple Bottom
Posts: 1034
Joined: July 27th, 2015, 2:06 pm
Contact:

Re: Catagolue Discussion Thread

Post by Apple Bottom » February 3rd, 2019, 5:04 am

This should probably be in a different thread, but it's related to the above changes. apgluxe 4.86-ll2.1.11 now segfaults when running b2k37s1e2an3-k6n78. 4.86-ll2.1.9 was fine.

-ll2.1.9:

Code: Select all

Greetings, this is apgluxe v4.86-ll2.1.9, configured for b2k37s1e2an3-k6n78/C1.

Lifelib version: ll2.1.9
Compiler version: 7.3.0
Python version: '2.7.14 (default, Oct 31 2017, 21:12:13)  [GCC 6.4.0]'

Using seed l_H9hQubp47xdb
Computing 2^18-entry lookup table...done!
Computing 2^24-entry lookup table...done!
Computing 2^18-entry mixing table...done!
Running 1000000 soups per haul:
Instruction set AVX1 detected
b2k37s1e2an3-k6n78/C1: 51414 soups completed (5141.400 soups/second current, 5141.400 overall).
[...]
-ll2.1.11:

Code: Select all

Greetings, this is apgluxe v4.86-ll2.1.11, configured for b2k37s1e2an3-k6n78/C1.

Lifelib version: ll2.1.11
Compiler version: 7.3.0
Python version: '2.7.14 (default, Oct 31 2017, 21:12:13)  [GCC 6.4.0]'

Peer-reviewing hauls:

Segmentation fault (core dumped)
If you speak, your speech must be better than your silence would have been. — Arabian proverb

Catagolue: Apple Bottom • Life Wiki: Apple Bottom • Twitter: @_AppleBottom_

Proud member of the Pattern Raiders!

User avatar
muzik
Posts: 5648
Joined: January 28th, 2016, 2:47 pm
Location: Scotland

Re: Catagolue Discussion Thread

Post by muzik » February 3rd, 2019, 8:43 am

Can the Generations aliases from LifeViewer be added to the list of rule names on Catagolue? I would commit them myself but I've already got a merge request for other rules and I don't know if I can have multiple merge requests for the same file at the same time without them overwriting each other.

User avatar
calcyman
Moderator
Posts: 2936
Joined: June 1st, 2009, 4:32 pm

Re: Catagolue Discussion Thread

Post by calcyman » February 3rd, 2019, 9:06 am

Apple Bottom wrote:This should probably be in a different thread, but it's related to the above changes. apgluxe 4.86-ll2.1.11 now segfaults when running b2k37s1e2an3-k6n78. 4.86-ll2.1.9 was fine.
I can't reproduce this error on any machine that I've tried, including an AVX1 computer running Cygwin.

Please can you check the following?
  • That the error persists after 'make clean' followed by 'make';
  • Which of the compiler flags is causing the problem? There are four that you need to check: -Ofast -flto -funsafe-loop-optimizations -frename-registers (so you should be able to do a bisection search quite quickly), and remember to 'make clean' followed by 'make' each time you change the flags.
Thanks!
muzik wrote:Can the Generations aliases from LifeViewer be added to the list of rule names on Catagolue? I would commit them myself but I've already got a merge request for other rules and I don't know if I can have multiple merge requests for the same file at the same time without them overwriting each other.
Yes, and you can also commit to an existing merge request (and it will update).
What do you do with ill crystallographers? Take them to the mono-clinic!

User avatar
Apple Bottom
Posts: 1034
Joined: July 27th, 2015, 2:06 pm
Contact:

Re: Catagolue Discussion Thread

Post by Apple Bottom » February 3rd, 2019, 9:31 am

calcyman wrote:Please can you check the following?
  • That the error persists after 'make clean' followed by 'make';
  • Which of the compiler flags is causing the problem? There are four that you need to check: -Ofast -flto -funsafe-loop-optimizations -frename-registers (so you should be able to do a bisection search quite quickly), and remember to 'make clean' followed by 'make' each time you change the flags.
Thanks!
Certainly can!
  • 'make clean' didn't make a difference.
  • Removing '-Ofast', however, did; keeping just '-flto -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers' I'm not seeing any issues.
IIRC '-Ofast' enables a variety of more aggressive optimizations that gcc nonetheless also offers fine-grained control over. Worth digging into, perhaps?

FWIW, removing both '-Ofast' and '-flto' caused gcc to issue a number of warnings:

Code: Select all

$ ./recompile.sh --rule B2k37/S1e2an3-k6n78
Skipping updates; use --update to update apgluxe automatically.
Ensuring lifelib is up-to-date...
Symmetry unspecified; assuming C1.
Configuring rule B2k37/S1e2an3-k6n78; symmetry C1
Warning: B2k37/S1e2an3-k6n78 interpreted as b2k37s1e2an3-k6n78
Valid symmetry: C1
Compressing 512-bit lookup table for rule b2k37s1e2an3-k6n78...
Creating magic sauce for rule b2k37s1e2an3-k6n78...
...completed.
Success!
g++ -c -Wall -Wextra -pedantic -O3 -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers -march=native --std=c++11  main.cpp -o main.o
In file included from lifelib/upattern.h:13:0,
                 from main.cpp:12:
lifelib/bitworld.h: In function ‘std::string apg::canonise_orientation(std::vector<apg::bitworld>&, int, int, int, int, int, int, int, int)’:
lifelib/bitworld.h:104:31: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
             for (int v = 0; v < ((breadth-1)/5)+1; v++) {
                             ~~^~~~~~~~~~~~~~~~~~~
lifelib/bitworld.h: In function ‘std::string apg::canonise_orientation(std::vector<apg::bitworld>&, int, int, int, int, int, int, int, int)’:
lifelib/bitworld.h:104:31: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
             for (int v = 0; v < ((breadth-1)/5)+1; v++) {
                             ~~^~~~~~~~~~~~~~~~~~~
lifelib/bitworld.h: In function ‘std::string apg::canonise_orientation(std::vector<apg::bitworld>&, int, int, int, int, int, int, int, int)’:
lifelib/bitworld.h:104:31: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
             for (int v = 0; v < ((breadth-1)/5)+1; v++) {
                             ~~^~~~~~~~~~~~~~~~~~~
lifelib/bitworld.h: In function ‘std::string apg::canonise_orientation(std::vector<apg::bitworld>&, int, int, int, int, int, int, int, int)’:
lifelib/bitworld.h:104:31: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
             for (int v = 0; v < ((breadth-1)/5)+1; v++) {
                             ~~^~~~~~~~~~~~~~~~~~~
In file included from main.cpp:13:0:
lifelib/classifier.h: In member function ‘std::vector<std::basic_string<char> > apg::base_classifier<M>::pbbosc(apg::pattern, uint64_t, uint64_t) [with int M = 1]’:
lifelib/classifier.h:112:52: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
                             for (uint64_t i = 0; i <= period; i++) {
                                                  ~~^~~~~~~~~
lifelib/classifier.h: In member function ‘std::vector<std::basic_string<char> > apg::base_classifier<M>::pseudoBangBang(apg::pattern, std::vector<apg::bitworld>*) [with int M = 1]’:
lifelib/classifier.h:239:36: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
             for (uint64_t l = 1; l <= label; l++) {
                                  ~~^~~~~~~~
In file included from lifelib/pattern2.h:2:0,
                 from lifelib/classifier.h:2,
                 from main.cpp:13:
lifelib/lifetree.h: In member function ‘uint64_t apg::lifetree_generic<I, N, J>::bound_recurse(apg::hypernode<I>, int, std::map<std::pair<I, unsigned int>, long unsigned int>*, uint32_t) [with I = unsigned int; int N = 1; J = apg::lifemeta<unsigned int>]’:
lifelib/lifetree.h:723:43: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
                     for (int64_t x = 0; x <= nexp; x++) {
                                         ~~^~~~~~~
lifelib/lifetree.h:739:43: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
                     for (int64_t y = 0; y <= nexp; y++) {
                                         ~~^~~~~~~
lifelib/lifetree.h: In member function ‘uint64_t apg::lifetree_generic<I, N, J>::bound_recurse(apg::hypernode<I>, int, std::map<std::pair<I, unsigned int>, long unsigned int>*, uint32_t) [with I = unsigned int; int N = 2; J = apg::lifemeta<unsigned int>]’:
lifelib/lifetree.h:723:43: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
                     for (int64_t x = 0; x <= nexp; x++) {
                                         ~~^~~~~~~
lifelib/lifetree.h:739:43: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
                     for (int64_t y = 0; y <= nexp; y++) {
                                         ~~^~~~~~~
g++ -c -Wall -Wextra -pedantic -O3 -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers -march=native --std=c++11  includes/md5.cpp -o includes/md5.o
includes/md5.cpp: In static member function ‘static void MD5::decode(MD5::uint4*, const uint1*, MD5::size_type)’:
includes/md5.cpp:139:37: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
   for (unsigned int i = 0, j = 0; j < len; i++, j += 4)
                                   ~~^~~~~
includes/md5.cpp: In static member function ‘static void MD5::encode(MD5::uint1*, const uint4*, MD5::size_type)’:
includes/md5.cpp:150:34: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
   for (size_type i = 0, j = 0; j < len; i++, j += 4) {
                                ~~^~~~~
includes/md5.cpp: In constructor ‘MD5::MD5(const string&)’:
includes/md5.cpp:274:39: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
     for (i = firstpart; i + blocksize <= length; i += blocksize)
                         ~~~~~~~~~~~~~~^~~~~~~~~
includes/md5.cpp: In member function ‘void MD5::update(const unsigned char*, MD5::size_type)’:
includes/md5.cpp:274:39: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
     for (i = firstpart; i + blocksize <= length; i += blocksize)
                         ~~~~~~~~~~~~~~^~~~~~~~~
includes/md5.cpp: In member function ‘void MD5::update(const char*, MD5::size_type)’:
includes/md5.cpp:274:39: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
     for (i = firstpart; i + blocksize <= length; i += blocksize)
                         ~~~~~~~~~~~~~~^~~~~~~~~
includes/md5.cpp: In function ‘std::string md5(std::string)’:
includes/md5.cpp:274:39: warning: missed loop optimization, the loop counter may overflow [-Wunsafe-loop-optimizations]
     for (i = firstpart; i + blocksize <= length; i += blocksize)
                         ~~~~~~~~~~~~~~^~~~~~~~~
g++ -c -Wall -Wextra -pedantic -O3 -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers -march=native --std=c++11  includes/happyhttp.cpp -o includes/happyhttp.o
g++ -flto -pthread  main.o includes/md5.o includes/happyhttp.o  -o apgluxe
rm -f main.op includes/md5.op includes/happyhttp.op apgluxe-profile *.gcda */*.gcda
true
true                                                oo o
true                                                oo ooo
true                                                      o
true                                                oo ooo
true                                                 o o
true                                                 o o
true                                                  o
apgluxe v4.86-ll2.1.11: Rule b2k37s1e2an3-k6n78 is correctly configured.
Putting in '-flto' again made those disappear.
If you speak, your speech must be better than your silence would have been. — Arabian proverb

Catagolue: Apple Bottom • Life Wiki: Apple Bottom • Twitter: @_AppleBottom_

Proud member of the Pattern Raiders!

Post Reply