apgsearch v2.2

Post by **calcyman** » July 28th, 2015, 3:09 pm

Some of you may have noticed a marked increase in the rate of soup-searching on Catagolue, or alternatively an increase in the rate of new objects reported by the Twitter bot. This is owing to a new version of the soup searcher that I've been developing for the last month. Here is a basic feature comparison:

Advantages over v1.x:

Approximately seven times faster.
Does not depend on Python or Golly.
Can run on 'headless' machines (such as high-performance clusters) from the command-line instead of requiring graphics support.
You can choose to parallelise a single instance across multiple cores instead of requiring a separate instance per core.

Disadvantages versus v1.x:

Only supports b3s23/C1.
Can only run on x86-64 machines (basically any modern computer, excluding the Raspberry Pi).

It has been tested on Windows (using Cygwin 64), Linux and Mac OS X.

The source code and instructions are available from https://gitlab.com/apgoucher/apgnano/.

Happy searching!

gmc_nxtman · Post by **gmc_nxtman** » July 28th, 2015, 3:32 pm

Sounds cool, one error though.

Code: Select all

Last login: Tue Jul 28 12:30:11 on ttys000
GalensSlverBook:apgnano.git galenmcholbi$ make
g++ -c -Wall -O3 -fopenmp main.cpp -o main.o
main.cpp:24:10: fatal error: 'omp.h' file not found
#include <omp.h>
         ^
1 error generated.
make: *** [main.o] Error 1
GalensSlverBook:apgnano.git galenmcholbi$

I'm on a mac. This is, once again, an error involving "include <omp.h>" which isn't in the apgnano.git directory.

Rocknlol · Post by **Rocknlol** » July 28th, 2015, 4:11 pm

Is support for other rules eventually planned for apgnano or are you only going to stick to b3s23/C1 support for the foreseeable future?

Post by **calcyman** » July 28th, 2015, 4:40 pm

gmc_nxtman wrote:I'm on a mac. This is, once again, an error involving "include <omp.h>" which isn't in the apgnano.git directory.

I've now made a commit which will hopefully resolve your problem (but you need to delete the following line at the top of main.cpp after you pull the changes).

Code: Select all

#define USE_OPEN_MP

Here's the commit, together with a flowery rant explaining why this was necessary:

https://gitlab.com/apgoucher/apgnano/co ... 2c16907608

gmc_nxtman · Post by **gmc_nxtman** » July 28th, 2015, 5:07 pm

calcyman wrote:
gmc_nxtman wrote:I'm on a mac. This is, once again, an error involving "include <omp.h>" which isn't in the apgnano.git directory.
I've now made a commit which will hopefully resolve your problem (but you need to delete the following line at the top of main.cpp after you pull the changes).
Code: Select all
#define USE_OPEN_MP
Here's the commit, together with a flowery rant explaining why this was necessary:

https://gitlab.com/apgoucher/apgnano/co ... 2c16907608

Cool, that seemed to have worked. Now I can't run the command ./apgnano without getting this error:

Code: Select all

Last login: Tue Jul 28 13:58:22 on ttys000
GalensSlverBook:apgnano.git galenmcholbi$ make
g++ -c -Wall -O3 -fopenmp main.cpp -o main.o
g++ -c -Wall -O3 -fopenmp digests/sha256.cpp -o digests/sha256.o
g++ -c -Wall -O3 -fopenmp digests/md5.cpp -o digests/md5.o
g++ -c -Wall -O3 -fopenmp happyhttp/happyhttp.cpp -o happyhttp/happyhttp.o
g++ -c -Wall -O3 -fopenmp gollybase/bigint.cpp -o gollybase/bigint.o
g++ -c -Wall -O3 -fopenmp gollybase/lifealgo.cpp -o gollybase/lifealgo.o
g++ -c -Wall -O3 -fopenmp gollybase/qlifealgo.cpp -o gollybase/qlifealgo.o
g++ -c -Wall -O3 -fopenmp gollybase/util.cpp -o gollybase/util.o
g++ -c -Wall -O3 -fopenmp gollybase/lifepoll.cpp -o gollybase/lifepoll.o
g++ -c -Wall -O3 -fopenmp gollybase/liferules.cpp -o gollybase/liferules.o
g++ -c -Wall -O3 -fopenmp gollybase/viewport.cpp -o gollybase/viewport.o
g++ -c -Wall -O3 -fopenmp gollybase/readpattern.cpp -o gollybase/readpattern.o
g++ -c -Wall -O3 -fope[code]

nmp gollybase/qlifedraw.cpp -o gollybase/qlifedraw.o
g++ -fopenmp main.o digests/sha256.o digests/md5.o happyhttp/happyhttp.o gollybase/bigint.o gollybase/lifealgo.o gollybase/qlifealgo.o gollybase/util.o gollybase/lifepoll.o gollybase/liferules.o gollybase/viewport.o gollybase/readpattern.o gollybase/qlifedraw.o -o apgnano
ld: library not found for -lgomp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [apgnano] Error 1
GalensSlverBook:apgnano.git galenmcholbi$ ./apgnano -n 500000
-bash: ./apgnano: No such file or directory
GalensSlverBook:apgnano.git galenmcholbi$
[/code]

This is probably due to the fact that no file called "apgnano" was ever created/added to the folder, so attempting to call a compiled "apgnano" is pointless. Should it have created such a file? Or should I have attempted to call another file?

EDIT As a sidenote, the apgsearch page will need to be updated as well.

Kazyan · Post by **Kazyan** » July 28th, 2015, 5:36 pm

Oh my, this is really something. You're right--the twitter bot is blowing up with all this new speed. And the command line is so much more convenient.

Cheers to searching!

Post by **calcyman** » July 28th, 2015, 6:20 pm

The error seems to be this line...

Code: Select all

ld: library not found for -lgomp

...in which case I suggest removing the flag '-fopenmp' from CFLAGS and LDFLAGS.

gmc_nxtman · Post by **gmc_nxtman** » July 28th, 2015, 6:31 pm

calcyman wrote:The error seems to be this line...
Code: Select all
ld: library not found for -lgomp
...in which case I suggest removing the flag '-fopenmp' from CFLAGS and LDFLAGS.

Yay, It finally worked! Now I have a few questions:

• Can you run multiple instances of it on different accounts on your computer?
• What happens if you close the lid of your computer?
• What makes it 7 times faster? The coding language, or did you execute tasks differently?

Overall, great work! Nice Job

Post by **calcyman** » July 28th, 2015, 7:25 pm

Can you run multiple instances of it on different accounts on your computer?

I don't know enough about the internal workings of Mac OS X to answer that question. You can certainly run multiple instances, and since you can't parallelise one instance over multiple cores without OpenMP, you want to be running one instance per core (so 2 if you have a dual-core computer, 4 for a quad-core, etc.).

If it's anything like Linux, you might be able to use either disown or nohup depending on your situation.

What happens if you close the lid of your computer?

I think that depends on your personal settings. My (Windows 7) laptop essentially goes into stasis and is revived when the lid is opened, thereby resuming the search. Again, I don't know how Macs behave.

What makes it 7 times faster? The coding language, or did you execute tasks differently?

That's the question I can answer. So the time taken to run a soup in version 1.x (Golly + Python) was roughly 7000 ms (on my computer):

1400 ms running QuickLife to stabilise soups;
5600 ms interpreting Python, talking to Golly, changing rules, recognising patterns, etc.

Clearly the bottleneck is the second of these bullet points. Now by rewriting everything in C++ and optimising, it no longer requires Python, or Golly, or rule-changing, and this 5600 ms can be reduce twentyfold down to 300 ms:

1400 ms running QuickLife to stabilise soups;
300 ms doing everything else.

So now it takes 1700 ms, making it four times faster than the original apgsearch. The bottleneck is now running QuickLife, which is a highly-optimised algorithm so is hard to improve. However, QuickLife is designed to run arbitrary rules, so has to process cells serially (well, 4 at a time using a 65536-element lookup table) -- whereas parallel speedups are possible by hard-coding B3/S23 into bitwise operations (c.f. Michael Simkin's LifeAPI).

Hence I wrote a bespoke highly-optimised algorithm (mostly in x86 assembly!

) to take advantage of parallelisation and processor architecture. It's called Life128 because it processes 128 cells in parallel using Streaming SIMD Extensions, a special set of highly parallel instructions which exist on modern x86-64 processors. Here is the source code for Life128 (warning: not for the faint-hearted!):

https://gitlab.com/apgoucher/apgnano/bl ... /life128.h

It splits the universe into slightly-overlapping 32-by-32 tiles and uses the assembly routine to compute each tile 2 generations into the future (computing the inner 28-by-28 square, hence why we need overlap). This routine is even faster than I'm making out, because each CPU has multiple ALUs which can work in parallel -- so this 882-instruction assembly routine actually runs in about 320 clock cycles!

If a tile is unchanged (i.e. is period-2), then it won't be recalculated until it actually can change (as a result of interference from neighbouring tiles). Since at any time in the evolution of a methuselah, the majority of the populated portion of the universe is occupied by period-2 ash, this significantly reduces the number of tiles to compute. Another (very minor!) optimisation is staggering the tiles in a brick-wall fashion so each tile needs only communicate with 6 neighbours, rather than 8.

To summarise, the resulting algorithm is twice as fast as QuickLife, so the current breakdown of time per soup resembles this:

700 ms running Life128 to stabilise soups;
300 ms doing everything else.

There's not really a clear bottleneck any more, and both halves are quite intensively optimised, so I decided to stop there. The total time is now 1000 ms per soup, which is indeed seven times faster than the original version.

Overall, great work! Nice Job

Thanks!

simsim314 · Post by **simsim314** » July 28th, 2015, 7:35 pm

Great! Everything works smoothly with cygwin64.

Few comments on the code:

1.main.cpp line 1635:

you should use (otherwise the counter is not exact as few threads updating it).

#pragma omp atomic
elapsed += 1;

2. line 1663:

You use int for i and for offset. I would suggest to use unsigned long long, as reaching MAX_INT soups (aroung 2 * 10^9) does not take so long (couple of days).

EDIT Could you add some performance benchmark? If you'll add atomic above elapsed each thread will have deterministic elapsed parameter, so adding something like if(elapsed % 10000 == 0) { report time elapsed } should work fine.

Post by **calcyman** » July 28th, 2015, 8:07 pm

simsim314 wrote:Few comments on the code:

Thanks!

1. Actually there isn't any concurrency issue with the line 'elapsed += 1'. Specifically, 'int elapsed' is declared within the 'pragma omp parallel' scope so there is a separate 'int elapsed' per thread. It does, however, only give an estimate of the number of soups (and indeed the lines 'xx0000 soups processed' do not necessarily appear in numerical order!).

I agree that your way (with a global elapsed and atomic updates) is much more elegant; I'm just concerned (probably unjustifiably so) about whether there's any overhead in having dozens of cores having to concurrently access the same counter every time a soup is processed. How much time is used by activating and deactivating a mutex?

I'm slightly confused about the performance reporting: where do I declare elapsed? If it is global (by which I mean declared outside the #pragma omp parallel { } scope), then surely it's possible for two threads to simultaneously be between the lines "increment elapsed" and "report time elapsed", and if they both agree on the value of elapsed (which they will, unless I've misunderstood something...?) then they will both attempt to simultaneously report the elapsed time.

2. Yes, I'll change those int32s to int64s. Well spotted!

gmc_nxtman · Post by **gmc_nxtman** » July 28th, 2015, 8:33 pm

I would suggest that the apgsearch page may need more elaboration. First of all, it would need updating for version 2.2 (obviously). Second, it should be able to see which version of apgsearch you downloaded last and if it isn't the newest one it will notify you. (If logged in) Second of all I want a "stay logged in" button so I don't have to log in every time I visit the website. For the updating, it should include links to all the versions of apgsearch (including v0.5) and provide information about the differences between 2.2 and the rest of the versions, and not making it mandatory to the apgsearch newcomers that they need Golly or Python 2.7. On the page should probably also be

•Either a link to this forum thread
•Instructions on how to use the versions, especially 2.2

simsim314 · Post by **simsim314** » July 29th, 2015, 3:52 am

calcyman wrote:I'm slightly confused about the performance reporting: where do I declare elapsed? If it is global (by which I mean declared outside the #pragma omp parallel { } scope), then surely it's possible for two threads to simultaneously be between the lines "increment elapsed" and "report time elapsed", and if they both agree on the value of elapsed (which they will, unless I've misunderstood something...?) then they will both attempt to simultaneously report the elapsed time.

Woops yes you absolutely right. Well what I would do in this case is to make the global counter updated after few loops on the local (say each local 10K will increase the global by one), while the update of the global and time report is under #pragma omp critical so no other thread will report at the time.

Piece of code for demonstration:

Code: Select all

#include <iostream>
#include <ctime> 

using namespace std;
int main()
{
	clock_t begin = clock();
	int globalcounter = 0;
	int total = 0; 
	
	#pragma omp parallel 
	{
		int localcounter = 0; 
		
		#pragma omp for
		for(int x = 0; x < 1000000000; x++)
		{
			localcounter++;
			
			int some_operation = 0;
			
			for(int i = 0; i < 100000; i++)
				some_operation += i * i + localcounter; 
				
			if(localcounter % 10000 == 0)
			{
				#pragma omp critical
				{
					globalcounter++;
					localcounter = 0; 
					
					if(globalcounter % 10 == 0)
					{
						cout << "elapsed:" << double(clock() - begin) / CLOCKS_PER_SEC << "\n";
						begin = clock();
						
						total += some_operation;
					}
				}
			}
		}
	}
}

Post by **calcyman** » July 29th, 2015, 5:24 pm

I've improved the documentation, as per gmc_nxtman's request. The Catagolue home page now actually gives a more substantial introduction, and also links to Ivan Fomichev's synchronised Twitter feed:

http://catagolue.appspot.com/home

The apgsearch page now gives a comparison between 2.x and 1.x and includes a link to the GitLab repository:

http://catagolue.appspot.com/apgsearch

The home page of the GitLab repository (synchronised with README.md) is more thorough, links back to Catagolue, and explains various OS-dependent caveats (such as the Mac OpenMP problems):

https://gitlab.com/apgoucher/apgnano

I've also replaced the ints with long longs as suggested by Michael Simkin.

Do you think anything else needs explaining? Dave Greene and I are thinking about advertising this to the wider hacker community, on a site such as Slashdot or Hacker News.

Freywa · Post by **Freywa** » July 29th, 2015, 8:08 pm

Be wary of Anonymous (i.e. the unthinking techno-anarchist users of 4chan and other forums who claim they're an organisation but are little more than a horde of trolls). Those people, upon hearing of APG, will hog the server's resources at the expense of us (the most qualified to study the results). Make it known in one single burst to minimise the time such hogs have to exploit things; we should get a small collection of responsible people to occupy available resources.

Also, I've been running overnights on 8 million soups a haul and 4 cores recently and the output is staggeringly fast.

Post by **Andrew** » July 29th, 2015, 11:32 pm

calcyman wrote:The home page of the GitLab repository (synchronised with README.md) is more thorough, links back to Catagolue, and explains various OS-dependent caveats (such as the Mac OpenMP problems):

https://gitlab.com/apgoucher/apgnano

I think it would be better to simply delete the "#define USE_OPEN_MP" line in main.cpp and handle everything in the makefile via these lines (after CC=g++):

Code: Select all

ifeq "$(shell uname)" "Darwin"
    # we're on Mac OS X, so check if clang is available
    ifeq "$(shell which clang++)" "/usr/bin/clang++"
        # assume we're on Mac OS 10.9 or later
        MACOSX_109_OR_LATER=1
    endif
endif
ifdef MACOSX_109_OR_LATER
    # g++ is really clang++ and there is currently no OpenMP support
    CFLAGS=-c -Wall -O3
else
    # assume we're using gcc with OpenMP support
    CFLAGS=-c -Wall -O3 -fopenmp -DUSE_OPEN_MP
    LDFLAGS=-fopenmp
endif

I've tested this on Mac OS 10.6 and 10.9 and it works fine. Mac users just type "make" and all will be hunky dory.

I've also put my 10.6 build of apgnano on my website so people using 10.9 or later can get a version with OpenMP support:

http://www.trevorrow.com/golly/apgnano

I've tested this on 10.9 using the -p option and it seems to work fine.

gameoflifeboy · Post by **gameoflifeboy** » July 30th, 2015, 12:19 am

I think you should mention on the home page that apgnano can only search asymmetric Life. It might be disappointing for someone excited to search other cellular automata to go straight to gitlab and download it, only to find out it can only search b3s23.

Billabob · Post by **Billabob** » July 30th, 2015, 6:45 am

calcyman wrote: [*]Approximately seven times faster.

Oh my. That's incredible!

Freywa · Post by **Freywa** » July 30th, 2015, 8:36 am

Billabob wrote:Oh my. That's incredible!

Well you've underestimated it. Under best conditions my (rather old, but still octocore i7) laptop could churn out 75 soups per second sustained with the Python script... as compared to a myriad soups every five seconds for C++/x86, a 27x speedup. Where can I find tutorials on x86 assembly?

Post by **calcyman** » July 30th, 2015, 5:09 pm

but still octocore

Have you tried running it with '-p 7' so that it parallelises over all but one of your cores? This will multiply its speed by the number of cores over which you parallelise it.

Where can I find tutorials on x86 assembly?

I don't know. Note that gcc can almost always write better assembly than you can (with -O3 enabled, it considers most of the tricks suggested at the bottom of the post), so it's only in very rare situations where one can gain a performance improvement.

The resources I used were:

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
https://en.wikipedia.org/wiki/X86_instruction_listings
Googling to find relevant StackOverflow questions.

The techniques to bear in mind (for maximal speed optimisation) are the following (several of these being advice from Tom Rokicki):

Use SSE for performing the same instruction on multiple data in parallel;
Try to avoid unnecessary move operations (usually they can be avoided by just being more clever);
Do as much as possible in the registers, and only read/write memory when you run out of registers;
Unroll fixed-length loops into straight-line code to take advantage of CPU pipelining and avoid branch prediction failures;
Try to make your dependency graph shallow, so that consecutive instructions can be executed simultaneously instead of having to wait for one to finish before the next one can begin (although to a certain extent, your hardware will reorder instructions to accomplish this).

To reiterate, by default you shouldn't even consider using assembly, unless you can be absolutely sure you've spotted an optimisation trick that gcc has missed. So for counting the population of a 28-by-28 square, for instance, I used an elaborate trick utilising int32s as four parallel int8s (and since we're using SSE, it actually packs 16 int8s into a single register).

Also, make sure that the piece of code you're optimising is actually a piece worth optimising.

gameoflifeboy · Post by **gameoflifeboy** » July 30th, 2015, 6:54 pm

Where are the progress files for apgnano saved? They don't seem to be in the Golly/apgsearch folder. Please tell me they're at least being saved on my computer.

Post by **calcyman** » July 30th, 2015, 7:32 pm

gameoflifeboy wrote:Where are the progress files for apgnano saved? They don't seem to be in the Golly/apgsearch folder. Please tell me they're at least being saved on my computer.

Erm... errrr... hey, look, a new p2!

http://catagolue.appspot.com/object/xp2 ... 1321/b3s23

Scorbie · Post by **Scorbie** » July 30th, 2015, 8:09 pm

Is it possible to modify the search a bit to run other symmetries in B3/S23?

Post by **dvgrn** » July 31st, 2015, 12:29 am

calcyman wrote:Erm... errrr... hey, look, a new p2!

New to the C1 search. anyway.

I was just looking at the object statistics today, now that you mention it. Looks like we might get to 25,000 distinct objects by the end of the year or so, depending on how many people move to apgsearch 2.2.

I hope one of the new objects is a loafer. Or a p7 oscillator or some other new period would be nice.

Apple Bottom · Post by **Apple Bottom** » August 1st, 2015, 6:57 am

gameoflifeboy wrote:Where are the progress files for apgnano saved? They don't seem to be in the Golly/apgsearch folder. Please tell me they're at least being saved on my computer.

Here's a patch against 2.2 to save logs locally in addition to uploading them:

Code: Select all

--- apgnano-orig/main.cpp       2015-07-30 14:27:28.755827300 +0200
+++ apgnano/main.cpp    2015-07-30 14:25:18.462375000 +0200
@@ -21,6 +21,7 @@
 #include <iostream>
 #include <vector>
 #include <sstream>
+#include <fstream>
 #include <algorithm>
 #include <cstdlib>
 #include <ctime>
@@ -1474,6 +1475,11 @@
         std::vector<std::pair<unsigned long long, std::string> > censusList = getSortedList(totobjs);

         std::ostringstream ss;
+
+        std::ofstream resultsFile;
+        std::ostringstream resultsFileName;
+
+        std::time_t timestamp = std::time(NULL);

         ss << authstring << "\n";
         ss << "@VERSION " << APG_VERSION << "\n";
@@ -1503,6 +1509,12 @@

             ss << "\n";
         }
+
+        resultsFileName << "log." << timestamp << "." << root << ".txt";
+
+        resultsFile.open(resultsFileName.str().c_str());
+        resultsFile << ss.str();
+        resultsFile.close();

         return catagolueRequest(ss.str().c_str(), "/apgsearch");

ConwayLife.com

apgsearch v2.2

apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2

Re: apgsearch v2.2