apgsearch v4.0
- testitemqlstudop
- Posts: 1367
- Joined: July 21st, 2016, 11:45 am
- Location: in catagolue
- Contact:
Re: apgsearch v4.0
I don't think my compiler optimizations should affect certain rules....
Re: apgsearch v4.0
It was actually Conway's Game of Life both times, with D8_1 and D8_4 symmetries. I should've clarified that, sorry. The processor I'm using is an AMD FX-6300.calcyman wrote: Which rule are you using, and on what CPU architecture?
Re: apgsearch v4.0
Thanks!Ian07 wrote:It was actually Conway's Game of Life both times, with D8_1 and D8_4 symmetries. I should've clarified that, sorry. The processor I'm using is an AMD FX-6300.calcyman wrote: Which rule are you using, and on what CPU architecture?
As before, my advice is the same as I gave to Apple Bottom:
- Run 'make clean' before ./recompile.sh;
- Try experimenting with a subset of the new optimisation flags to see which one is culpable.
What do you do with ill crystallographers? Take them to the mono-clinic!
- Apple Bottom
- Posts: 1034
- Joined: July 27th, 2015, 2:06 pm
- Contact:
Re: apgsearch v4.0
I'm also seeing a new blip on the proverbial radar with 4.87-ll2.1.12: soups in B2k37/S1e2an3-k6n78 die out completely without leaving any debris behind. (I did run 'make clean' this time; didn't fiddle with the compiler flags yet, but since it worked fine in 4.86-ll2.1.11 I reckon this may be due to the latest lifelib changes instead.)
(I guess that explains that 80% increase in search speed...)
(I guess that explains that 80% increase in search speed...)
If you speak, your speech must be better than your silence would have been. — Arabian proverb
Catagolue: Apple Bottom • Life Wiki: Apple Bottom • Twitter: @_AppleBottom_
Proud member of the Pattern Raiders!
Catagolue: Apple Bottom • Life Wiki: Apple Bottom • Twitter: @_AppleBottom_
Proud member of the Pattern Raiders!
Re: apgsearch v4.0
Thanks! Fixed in apgluxe v4.88-ll2.1.13 (commit d2a3344e).Apple Bottom wrote:I'm also seeing a new blip on the proverbial radar with 4.87-ll2.1.12: soups in B2k37/S1e2an3-k6n78 die out completely without leaving any debris behind. (I did run 'make clean' this time; didn't fiddle with the compiler flags yet, but since it worked fine in 4.86-ll2.1.11 I reckon this may be due to the latest lifelib changes instead.)
(I guess that explains that 80% increase in search speed...)
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
Some of the flags in v4.88-ll2.1.13 throw warnings for me:
It still apgsearches as normal after that, though. If it's an issue of those flags being compatible with gcc but not clang, I'm not sure why that would even be happening as I have gcc.
Code: Select all
g++ -c -Wall -Wextra -pedantic -O3 -flto -funsafe-loop-optimizations -Wunsafe-loop-optimizations -frename-registers -march=native --std=c++11 main.cpp -o main.o
clang: warning: optimization flag '-funsafe-loop-optimizations' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-frename-registers' is not supported [-Wignored-optimization-argument]
warning: unknown warning option '-Wunsafe-loop-optimizations'; did you mean
'-Wunavailable-declarations'? [-Wunknown-warning-option]
1 warning generated.
g++ -flto -pthread main.o includes/md5.o includes/happyhttp.o -o apgluxe
clang: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
rm -f main.op includes/md5.op includes/happyhttp.op apgluxe-profile *.gcda */*.gcda
true
true oo o
true oo ooo
true o
true oo ooo
true o o
true o o
true o
apgluxe v4.88-ll2.1.13: Rule b3s23 is correctly configured.
apgluxe v4.88-ll2.1.13: Symmetry C1 is correctly configured.
Greetings, this is apgluxe v4.88-ll2.1.13, configured for b3s23/C1.
Re: apgsearch v4.9
I've released v4.9-ll2.1.14, which is 1.5x faster than v4.88-ll2.1.13.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.9
Yikes. I'm seeing just over a 50% speed improvement, as promised.calcyman wrote:I've released v4.9-ll2.1.14, which is 1.5x faster than v4.88-ll2.1.13.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
The only thing I'm a little worried by is the occasional wildly inaccurate estimate for methuselah longevity. I don't remember seeing anything with quite this large a mismatch in previous builds:
Code: Select all
Soup l_WiSegGLcr9c7583691 lasts an estimated 28270 generations; rerunning...
Soup l_WiSegGLcr9c7583691 actually lasts 36 generations.
Re: apgsearch v4.9
As I suspected, it's because it contains a period-8 oscillator: https://catagolue.appspot.com/hashsoup/ ... 3691/b3s23dvgrn wrote:Yikes. I'm seeing just over a 50% speed improvement, as promised.calcyman wrote:I've released v4.9-ll2.1.14, which is 1.5x faster than v4.88-ll2.1.13.
The difference is due to escaping glider detection: if a rule is outer-totalistic and supports gliders, then any sufficiently isolated escaping gliders on the edge of the pattern's bounding diamond will be removed since they cannot subsequently interact with the rest of the pattern.
The only thing I'm a little worried by is the occasional wildly inaccurate estimate for methuselah longevity. I don't remember seeing anything with quite this large a mismatch in previous builds:
Code: Select all
Soup l_WiSegGLcr9c7583691 lasts an estimated 28270 generations; rerunning... Soup l_WiSegGLcr9c7583691 actually lasts 36 generations.
I've removed the piece of code that was causing this anomaly, and it seems to have made no difference to searching speed:
https://gitlab.com/apgoucher/apgmera/co ... 3bb2a7d2b7
Interestingly, it looks like the --profile flag now has no visible effect on searching speed (at least on my AVX2 machine). I can imagine that the glider removal is affecting it by reducing the upattern's memory consumption (and therefore L1 cache misses).
I'm about to perf it on my AVX-512 machine to see whether anything has changed (other than the raw speed).
EDIT: Curiouser and curiouser. The --profile optimisation delivers a significant benefit on my AVX-512 machine (from 9017 soups/sec to 9428 soups/sec), but not on my AVX2 machine. Here's the printout without profiling:
Code: Select all
b3s23/C1: 1000000 soups completed (8977.555 soups/second current, 9017.179 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
testing mode
testing
Connection was successful; starting new search...
----------------------------------------------------------------------
New seed: l_WMKAxyqQG9Xh; iterations = 1; quitByUser = 0
Terminating...
Performance counter stats for './apgluxe -n 1000000 -t 1 -v 0 -s test':
110900.704765 task-clock (msec) # 0.999 CPUs utilized
138 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
10,521 page-faults # 0.095 K/sec
384,053,084,119 cycles # 3.463 GHz
730,759,750,253 instructions # 1.90 insn per cycle
85,375,418,692 branches # 769.837 M/sec
4,444,271,844 branch-misses # 5.21% of all branches
110.994994644 seconds time elapsed
Code: Select all
b3s23/C1: 1000000 soups completed (9416.388 soups/second current, 9428.049 overall).
----------------------------------------------------------------------
1000000 soups completed.
Attempting to contact payosha256.
testing mode
testing
Connection was successful; starting new search...
----------------------------------------------------------------------
New seed: l_7TDp2JjnLps4; iterations = 1; quitByUser = 0
Terminating...
Performance counter stats for './apgluxe -n 1000000 -t 1 -v 0 -s test':
106068.101402 task-clock (msec) # 0.999 CPUs utilized
131 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
10,517 page-faults # 0.099 K/sec
367,314,725,588 cycles # 3.463 GHz
663,194,699,312 instructions # 1.81 insn per cycle
67,919,373,179 branches # 640.337 M/sec
4,273,100,815 branch-misses # 6.29% of all branches
106.166132540 seconds time elapsed
What do you do with ill crystallographers? Take them to the mono-clinic!
- testitemqlstudop
- Posts: 1367
- Joined: July 21st, 2016, 11:45 am
- Location: in catagolue
- Contact:
Re: apgsearch v4.9
HOLY COW
Great job, I need to update when I get home!
Are you sure profiling doesn't do anything on AVX2?
EDIT: There seems to be more branch misses with profiling, however. I'm not sure this is the expected behaviour of profiling - it should decrease branch misses.
By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
Great job, I need to update when I get home!
Are you sure profiling doesn't do anything on AVX2?
EDIT: There seems to be more branch misses with profiling, however. I'm not sure this is the expected behaviour of profiling - it should decrease branch misses.
By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
Re: apgsearch v4.9
The *number* of branch misses has decreased with profiling; it's the proportion (out of total branches) that increased slightly.testitemqlstudop wrote:HOLY COW
Great job, I need to update when I get home!
Are you sure profiling doesn't do anything on AVX2?
EDIT: There seems to be more branch misses with profiling, however. I'm not sure this is the expected behaviour of profiling - it should decrease branch misses.
I did another run of perf report to find out where the hotspots were, and found a further inefficiency. After amending it* (v4.91-ll2.1.15) it increases the AVX-512 speed from 9428 to 10140 soups/second.
* basically I found that a conditional within the implementation of my indirected_map data structure matched the same as one in the lifelib code, so I broke the layer of abstraction in order to combine them into the same conditional block.
I'm rerunning perf to see whether anything else can be done.
What do you do with ill crystallographers? Take them to the mono-clinic!
- testitemqlstudop
- Posts: 1367
- Joined: July 21st, 2016, 11:45 am
- Location: in catagolue
- Contact:
Re: apgsearch v4.0
I'm home now.
Wow, I get a 70% speed increase:
EDIT 2: I don't think you fixed dvrgn's problem completely:
Uncoincidentally, they are both right after "Rare oscillator: xp8_***"
Wow, I get a 70% speed increase:
Code: Select all
Using seed l_rkjhCzsjjPnE
Running 10000000 soups per haul:
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 75592 soups completed (7559.052 soups/second current, 7559.052 overall).
Linear-growth pattern detected: yl144_1_16_afb5f3db909e60548f086e22ee3353ac
b3s23/C1: 150994 soups completed (7540.198 soups/second current, 7549.616 overall).
EDIT: I just realized the credits section isn't in the README, its on catagolue?testitemqlstudop wrote:
By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
EDIT 2: I don't think you fixed dvrgn's problem completely:
Code: Select all
Soup l_D2nHNvsxKXRa2318642 lasts an estimated 24730 generations; rerunning...
Soup l_D2nHNvsxKXRa2318642 actually lasts 5587 generations.
Code: Select all
Soup l_D2nHNvsxKXRa5405924 lasts an estimated 24730 generations; rerunning...
Soup l_D2nHNvsxKXRa5405924 actually lasts 5200 generations.
Re: apgsearch v4.9
As I've said, apgluxe hardly uses any floating-point arithmetic. But you do raise a good point, which is that (even in AVX) there's a separate single-precision ALU, double-precision ALU, and integer ALU, and there are instructions duplicated between them. For instance, for bitwise AND, you can use either vpand, vandps, or vandpd. Apparently there might be benefit from mixing and matching them.testitemqlstudop wrote:By the way, when I was playing around a little with optimization flags (yet again), -mfpmath=both gives the (old) apgluxe a +100 soups/second, at the cost of (I heard from the gcc manuals) an iota more memory.
Useful?
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
On an AVX1 machine, I can also report a major improvement going from v4.88 to v4.91: from ~2700 soups/sec/core to ~4000 soups/sec/core for b3s23/C1. Great!
Most of the v4.88 warnings have also disappeared, with only the "pthread" one remaining. Am I understanding it correctly that this speed improvement would apply to any rule with xq4_153, but not extend to other spaceships?
EDIT: b36is234j/C1, a rule in which xq4_153 are also the most common spaceship, does not show any such improvements, with the v4.88 and v4.91 search speeds being about the same (~700 soups/sec/core).
Worryingly, b3-ekqr4nt5r6is02-c3/C1, another rule with xq4_153 as the most common spaceship, has actually gotten slightly slower in v4.91, from ~4900 soups/sec/core to ~4600 soups/sec/core.
EDIT 2: Oh right, it said "outer-totalistic" also. So it would apply only to outer-totalistic rules with xq4_153? That still doesn't explain why the non-totalistic rule got slower, though.
Most of the v4.88 warnings have also disappeared, with only the "pthread" one remaining. Am I understanding it correctly that this speed improvement would apply to any rule with xq4_153, but not extend to other spaceships?
EDIT: b36is234j/C1, a rule in which xq4_153 are also the most common spaceship, does not show any such improvements, with the v4.88 and v4.91 search speeds being about the same (~700 soups/sec/core).
Worryingly, b3-ekqr4nt5r6is02-c3/C1, another rule with xq4_153 as the most common spaceship, has actually gotten slightly slower in v4.91, from ~4900 soups/sec/core to ~4600 soups/sec/core.
EDIT 2: Oh right, it said "outer-totalistic" also. So it would apply only to outer-totalistic rules with xq4_153? That still doesn't explain why the non-totalistic rule got slower, though.
Re: apgsearch v4.0
Excellent to hear that!77topaz wrote:On an AVX1 machine, I can also report a major improvement going from v4.88 to v4.91: from ~2700 soups/sec/core to ~4000 soups/sec/core for b3s23/C1. Great!
Yes, I moved some of the more hardcore optimisation flags into the optional set that's only activated by './recompile.sh --profile'. I guess that -pthread warning is harmless; clang must already include the pthreads library by default as otherwise there would be a linker error.Most of the v4.88 warnings have also disappeared, with only the "pthread" one remaining.
Correct. It only applies to 'Glider 115', i.e. these 256 rules: http://fano.ics.uci.edu/ca/rules/b3s23/g2.htmlAm I understanding it correctly that this speed improvement would apply to any rule with xq4_153, but not extend to other spaceships?
See the lines in mkparams.py:
Code: Select all
if (re.match('b36?7?8?s0?235?6?7?8?$', rulestring)):
g.write('#define GLIDERS_EXIST 1\n')
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
Programs can get faster/slower on certain platforms for very unpredictable reasons due to minor changes: sometimes inserting NOP instructions (which do nothing) can actually increase speed, bizarrely. Also, certain CPUs throttle programs if they're too fast (to avoid overheating the CPU). So in general I make changes based on evidence from multiple architectures, rather than just one (and I can't reproduce the effect on my AVX2 machine).77topaz wrote:EDIT 2: Oh right, it said "outer-totalistic" also. So it would apply only to outer-totalistic rules with xq4_153? That still doesn't explain why the non-totalistic rule got slower, though.
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
Hmm... despite not featuring CGoL's glider, the outer-totalistic rule B35/S2467 also seems to have gained a significant speed increase between v4.86 and v4.91. I don't have the v4.86 search speed on hand anymore, but hauls of 5m soups took between four and four and a half minutes (and in v4.72 around five), but in v4.91 they take just two and a half minutes (with search speed ~34k soups/sec).
Re: apgsearch v4.0
That could be any combination of the 'faster population determination in upattern' (which basically saves memory accesses), the early exiting when the number of tiles to update reaches zero, the inlined access of indirected_map, and the compiler optimisations made by testitemqlstudop.77topaz wrote:Hmm... despite not featuring CGoL's glider, the outer-totalistic rule B35/S2467 also seems to have gained a significant speed increase between v4.86 and v4.91. I don't have the v4.86 search speed on hand anymore, but hauls of 5m soups took between four and four and a half minutes (and in v4.72 around five), but in v4.91 they take just two and a half minutes (with search speed ~34k soups/sec).
Also... did you do an OS or compiler upgrade in the interim? I noticed a moderate speed boost when I upgraded Ubuntu 16.04 to 18.04 on the AVX-512 machine (upgrading gcc and glibc in the process).
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
There's now a precompiled Windows binary of apgluxe v4.95 (compiled for b3s23/C1) available from Catagolue:
https://catagolue.appspot.com/apgsearch
It doesn't require Cygwin, and it prompts the user to enter their key, number of hauls, and number of cores over which to compile. (And, unlike on Cygwin, parallelisation actually works for this version.) It's a text-based prompt rather than a GUI, but should still be much much more user-friendly than requiring the user to download Cygwin.
Aside: If you want to compile a native Windows executable for an arbitrary rule/symmetry combination, then you'll need to install mingw-w64 on Cygwin/Linux and run:
which will build apgluxe for native Windows. (On Linux, this command will amusingly return a nonzero exit code, simply because it tries to run the executable and inevitably fails due to binary incompatibility.)
https://catagolue.appspot.com/apgsearch
It doesn't require Cygwin, and it prompts the user to enter their key, number of hauls, and number of cores over which to compile. (And, unlike on Cygwin, parallelisation actually works for this version.) It's a text-based prompt rather than a GUI, but should still be much much more user-friendly than requiring the user to download Cygwin.
Aside: If you want to compile a native Windows executable for an arbitrary rule/symmetry combination, then you'll need to install mingw-w64 on Cygwin/Linux and run:
Code: Select all
./recompile.sh --mingw --rule b38s23 --symmetry C1
What do you do with ill crystallographers? Take them to the mono-clinic!
Re: apgsearch v4.0
This may be the stupidest question I've ever asked, but could a version of apgsearch be made for the Nintendo DSi or 3DS? I have numerous of them sitting about and it'd be cool to put them to good use while not being used.
Help wanted: How can we accurately notate any 1D replicator?
Re: apgsearch v4.0
The executable worked fine for B3/S23/C1 for me, and parallelization actually worked properly for once! However, I couldn't find the mingw-w64 package for Cygwin in the list when I tried to reinstall it, instead seeing various packages prefixed with mingw64.
Also, do non-B3/S23 rules still have the 10M soup limit for hauls? That makes sense for Conway's Game of Life, but it's definitely excessive for really slow rules like Omosso.
One more minor thing; the color codes don't work in the Windows terminal:
Also, do non-B3/S23 rules still have the 10M soup limit for hauls? That makes sense for Conway's Game of Life, but it's definitely excessive for really slow rules like Omosso.
One more minor thing; the color codes don't work in the Windows terminal:
Code: Select all
Greetings, this is [1;33mapgluxe v4.95-ll2.1.15[0m, configured for [1;34mb3s23/C1[0m.
[32;1mLifelib version:[0m ll2.1.15
[32;1mCompiler version:[0m 6.3.0 20170516
[32;1mPython version:[0m '2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]'
Peer-reviewing hauls:
No more hauls to verify.
Peer-review complete; proceeding search.
Using seed l_3De9txKtUB9r
Instruction set [1mAVX1[0m detected
0 soups processed...
100000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
200000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
300000 soups processed...
Linear-growth pattern detected: [1;32myl144_1_16_afb5f3db909e60548f086e22ee3353ac[0m
Re: apgsearch v4.0
It appears that the 3DS uses ARM processors (based on a cursory glance of https://3dbrew.org/wiki/Hardware#Common_hardware which I might have misinterpreted) rather than x86_64, so it would require considerable changes to lifelib (essentially writing pure C equivalents of the inline assembly routines).muzik wrote:This may be the stupidest question I've ever asked, but could a version of apgsearch be made for the Nintendo DSi or 3DS? I have numerous of them sitting about and it'd be cool to put them to good use while not being used.
Excellent!Ian07 wrote:The executable worked fine for B3/S23/C1 for me, and parallelization actually worked properly for once!
It's probably mingw64-g++ that you need.However, I couldn't find the mingw-w64 package for Cygwin in the list when I tried to reinstall it, instead seeing various packages prefixed with mingw64.
It's very much a soft limit: you can override it by calling the executable from Command Prompt and passing the usual flags. The main reason is to limit server load on Catagolue, especially if 77topaz's ethicacha idea becomes popular and results in many people running the executable. (There's a hard 100G upper limit, as beyond that a b3s23/C1 haul would begin to exceed the megabyte limit.)Also, do non-B3/S23 rules still have the 10M soup limit for hauls?
Good observation. I'm unsure as to the best way to address this. If you run the same executable from a different terminal (such as mintty, which is the terminal used by Cygwin, MSYS, and 'Git Bash'), then the ANSI colour codes are correctly interpreted, so it's not as simple as just testing whether it's been compiled for Windows or POSIX.One more minor thing; the color codes don't work in the Windows terminal:
What do you do with ill crystallographers? Take them to the mono-clinic!
- benetnasch85
- Posts: 31
- Joined: March 17th, 2017, 12:09 am
Re: apgsearch v4.0
Today I've been running v4.92 under cygwin vs. v4.95 in a Windows cmd window on our AVX1 machine, both at low priority.
v4.92 10101101 soups/haul
v4.95 10101106 soups/haul
Using command line options, v4.95 is running 1 to 3% faster, but it doesn't stop on "q".
Entering the options interactively results in a different output scheme (messages every 100000 soups) and may be very slightly faster, but also does not respond to "q".
v4.92 10101101 soups/haul
v4.95 10101106 soups/haul
Using command line options, v4.95 is running 1 to 3% faster, but it doesn't stop on "q".
Entering the options interactively results in a different output scheme (messages every 100000 soups) and may be very slightly faster, but also does not respond to "q".
Re: apgsearch v4.0
Yes, responding to 'q' only works for POSIX (Cygwin / Linux / Mac); Windows handles terminals in a different manner.benetnasch85 wrote:Today I've been running v4.92 under cygwin vs. v4.95 in a Windows cmd window on our AVX1 machine, both at low priority.
v4.92 10101101 soups/haul
v4.95 10101106 soups/haul
Using command line options, v4.95 is running 1 to 3% faster, but it doesn't stop on "q".
Entering the options interactively results in a different output scheme (messages every 100000 soups) and may be very slightly faster, but also does not respond to "q".
What do you do with ill crystallographers? Take them to the mono-clinic!
- testitemqlstudop
- Posts: 1367
- Joined: July 21st, 2016, 11:45 am
- Location: in catagolue
- Contact:
Re: apgsearch v4.0
On Linux (Lubuntu 18.10), when I run apgluxe (latest) either with "-p 0" or "-p 4", both don't respond to pressing "q".