Prototype / demo for a (weighted) collision search program

For scripts to aid with computation or simulation in cellular automata.
oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Prototype / demo for a (weighted) collision search program

Post by oblique » February 7th, 2014, 2:27 pm

So. Here it is now.

A first and stupid (very) early prototype or rather a proof-of-basics-demo of the search program for slow salvo searching program I have been developing for projects like viewtopic.php?f=2&t=1274

It can't do realy much: you can compile it and use it for colliding some "bullet" on a number of "lanes" with a "target".

The current version collides every target from some pattern-list file with any possible lane (up to a definable maximum representing the limitations implied by the construction engine).

It evaluates the collision for (up to) a given number of generations with in a definable sized "laboratory". It then just dumps some debug info of the result and waits for <ENTER> (then it picks the next collision ...)

The next steps will be storing the results in a database and re-insert all possible collisions with usable (i.e.: stable) targets back into the working queue ...

If you like: try it out.

Comments welcome.

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 7th, 2014, 3:20 pm

https://github.com/jukaiser/sscs/blob/master/BaseIdea wrote:TO CHECK:
- Does it pay off, if we allow non-colliding overlaps between sites?
- Are cases interesting were the next glider has to move thru a neighbouring site?
- Should we allow slow reactions, which take longer then dt to settle?
For #1 and #2, I think it may make sense to define "neighboring site" as the set of lanes where gliders will hit the one-step-back copy (for the neighbor above) or the one-step-ahead copy (for the neighbor below). This should avoid most of the worries about the middle question above -- if a range of glider lanes doesn't affect the neighbor-above or the neighbor-below at all, then those gliders are never moving through a "neighboring site".

There could certainly be cases where a glider might pass through the middle of a neighbor constellation and still hit the original target. I think it will rarely happen that this will produce a useful recipe, though -- too much likelihood of incompatible collisions later on -- so if it's simpler, maybe the search should be limited to contiguous ranges of glider lanes. Maybe a search would still work even with non-contiguous ranges, though.

Collisions between neighboring sites are generally trouble. Let's say you send a glider to transform pattern A into pattern B, and the active reaction touches the not-yet-transformed copy of pattern A directly above it. This changes the copy to a new pattern, A'. The next glider will hit A' instead of A, and the reaction may now be completely different.

There are ways to define "safe" collisions, but it's a little tricky. Two cases come to mind.

First: a given reaction might only modify some part of the neighbor-above constellation that isn't used in the reaction -- i.e., the reaction has the same result whether the modified part is there or not. For example, a spark might delete a neighboring block that was just going to get cleaned up anyway. In a Caterpillar-type spaceship, these columns of reactions are ultimately self-sustaining, so there's no paradox involved -- the block just won't be there.

But if a block gets cleaned up in the neighbor-above, that means it's no longer part of the target (and it won't be in the neighbor-below, either, etc., etc.). So you'd have to notice the interaction and modify the target. This could get to be a bit of a headache, so maybe it's easier to just forbid any such interactions.

Second: a reaction might be modified by a neighbor, leaving the neighbor unaffected -- the easiest example would be a block catalyzing an active reaction. The block recovers and is unharmed, but the reaction is suppressed or modified in some useful way. I don't think there's any particular reason to forbid this kind of interaction.

The #3 question, about settling time, has a clearer answer. In generating these slow salvos, we can delay successive gliders by as long as we want -- simply by moving rephaser/rakes farther down the block chains.

It's reasonable to set some kind of upper limit on settling time, just because patterns that are unsettled for thousands of generations are usually going to leave too much ash to be useful. But the value of dt doesn't have to enter into the calculation, except that the neighbor-above is going to get triggered after dt ticks, and there will probably be trouble if the active reactions come into contact with each other.

Here again, I think it may be theoretically possible to set up a self-sustaining interaction that has a useful effect. But it's probably not worth worrying about in practice. If two neighboring reactions separated by (dx, 0, dt) interact with each other, it's simpler just to prune off that branch of the search tree...!

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 8th, 2014, 3:47 am

dvgrn wrote: There could certainly be cases where a glider might pass through the middle of a neighbor constellation and still hit the original target. I think it will rarely happen that this will produce a useful recipe, though -- too much likelihood of incompatible collisions later on -- so if it's simpler, maybe the search should be limited to contiguous ranges of glider lanes. Maybe a search would still work even with non-contiguous ranges, though.
For my "script" this is easily detectable: if you have a "fly-by" condition, and the pattern allows more lanes then period of the rake provides, the flown-by glider might hit the "neighbor" on lane "original lane" + "period of ship".

BUT: Does that really work? We don't have a static trail of the current target repeated and repeated again. We have a sequence of different construction phases. I would have to cut & paste up an example to realy understand what happens ... me to stupid ;)

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 8th, 2014, 3:57 am

dvgrn wrote: Collisions between neighboring sites are generally trouble. Let's say you send a glider to transform pattern A into pattern B, and the active reaction touches the not-yet-transformed copy of pattern A directly above it. This changes the copy to a new pattern, A'. The next glider will hit A' instead of A, and the reaction may now be completely different.
My idea was more like summing up the affected area of the current reaction and checking if it overlaps the area affected by the old target (if replicated "31" cells above the current site) and the new target (replicated "31 cells" below).

So: i.e. block pushers might shift the entire site by a few cells, but that won't hurt if there is no real overlap. Or you have a small pattern that yields an explosion, but the result of that explosion is so small, that neither the old pattern nor the new one would interfer.

That assumes the reaction is settled within "240" ticks. If we would like to consider slow almost overlapping reactions we would have to check phase by phase (or just rerun the reaction with phase shifted copies above and below - alas my current algorithm for pat_generate does not "like" large patterns too much).
dvgrn wrote: Second: a reaction might be modified by a neighbor, leaving the neighbor unaffected -- the easiest example would be a block catalyzing an active reaction. The block recovers and is unharmed, but the reaction is suppressed or modified in some useful way. I don't think there's any particular reason to forbid this kind of interaction.
Hmm - that would really mean: replicate and rerun phase shifted. Expensive. But on the other hand: that is a pretty rare case. If one in a hundered collisions needs to be rechecked in that way it won't hurt the search to much.

Or: As my program will stuff *every* reaction into the database anyways we could search for this cases seperatly with another tool ...
dvgrn wrote: Here again, I think it may be theoretically possible to set up a self-sustaining interaction that has a useful effect. But it's probably not worth worrying about in practice. If two neighboring reactions separated by (dx, 0, dt) interact with each other, it's simpler just to prune off that branch of the search tree...!
As I implied above: "pruned off" means not "lost" ... my plan is to record any and every reaction in the database, not only to check for loops (the search tree is not realy a tree) but to allow differnt searches to reuse the reactions already examined in a previous run. The program will just note something like: "hitting a boat with an glider on that lane yields a reaction I followed for 276 reaction before I gave up with the resulting mess below".

Selfinterfering reactions would have to be marked as such to avoid reusing them.

Another idea is that we could rerun a partial search (every search will be partial of course - the search space is almost unlimited) and start off were we left the last time ...

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 8th, 2014, 11:04 am

oblique wrote:Hmm - that would really mean: replicate and rerun phase shifted. Expensive. But on the other hand: that is a pretty rare case. If one in a hundered collisions needs to be rechecked in that way it won't hurt the search to much.
I'd be more worried that dealing with that case correctly would slow down the coding process, than that it would slow down the actual search. A large number of promising Life search programs are lying around unfinished because their designers got just a little bit too ambitious.

Anyway, if this is really a one-in-a-hundred rare case, you can avoid dealing with it for now, and still explore almost all of the search tree. Same with rare (non-P2) oscillators like the pulsar -- they're just not statistically likely to produce useful recipes.
oblique wrote:Or: As my program will stuff *every* reaction into the database anyways we could search for this cases seperatly with another tool ...
Yes! I think "worry about it later" is your most valuable programming tool.
oblique wrote:As I implied above: "pruned off" means not "lost" ... The program will just note something like: "hitting a boat with an glider on that lane yields a reaction I followed for 276 reaction before I gave up with the resulting mess below".
Sounds good. When you get to that point, it might be worth doing a few benchmarks -- seems like it would be quicker to regenerate the "resulting mess" from T=0 if you ever need it again, rather than storing the literal pattern in the database. Maybe store just a hash of the pattern, or just the population count?

A large fraction of the leaves on the search tree will be these big-active-mess patterns, so storing a few bytes instead of several hundred bytes will eventually mean a 100x smaller database. If there's any kind of upper limit on database size, then that will eventually mean more room for deeper searches...?

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 9th, 2014, 11:40 am

dvgrn wrote:[...]
Anyway, if this is really a one-in-a-hundred rare case, you can avoid dealing with it for now, and still explore almost all of the search tree. Same with rare (non-P2) oscillators like the pulsar -- they're just not statistically likely to produce useful recipes.

[...]
Yes! I think "worry about it later" is your most valuable programming tool.
I wil leave the "collides without harm" for later. If there are abundandly many results which seem to explode but which would be usefull, this case could be still adressed (later).

As for pulsar and other P>2: the program handles them allready. The user can define the starting targets and he/she can define the maximum period to search for in determining if a result is considered stable.

The only difference between P2 and P>2 is that the program will look for P2 as soon as it is clear that the initial collision is over (i.e.: after fly-by-detection). Checking for P3 or P7 there would be pointless because they are *much* rarer as a collision result. Checking at the end of the simulation on the other hand is not that complicated.
dvgrn wrote:
oblique wrote:As I implied above: "pruned off" means not "lost" ... The program will just note something like: "hitting a boat with an glider on that lane yields a reaction I followed for 276 reaction before I gave up with the resulting mess below".
Sounds good. When you get to that point, it might be worth doing a few benchmarks -- seems like it would be quicker to regenerate the "resulting mess" from T=0 if you ever need it again, rather than storing the literal pattern in the database. Maybe store just a hash of the pattern, or just the population count?

A large fraction of the leaves on the search tree will be these big-active-mess patterns, so storing a few bytes instead of several hundred bytes will eventually mean a 100x smaller database.
I did do some benchmarks (there are results in a file at the github-link). Calculating rle strings and storing the result into the database is not that expensive (depends largely on database engine, but I chose the fastest one, because I'm not planing to be prepared for earthquakes and other people using the same database r/w while the program is run).

Besides: the program has to store the result somewhere anyhow, adding another field doesn't look that expensive. Well, I could use some hash table or btree for loop detection, but as I said: the db engine I'm currently using is fast enough for now (plus we get results while the program is still running!).

As for size: even mysql can handle several 100 GB of data easily ...
And still be efficiant.

The only size limit I'm concerned of at the moment is that the rle string is currently limited to 1000 chars (maximum size of an index column with the database engine and version of mysql/mariadb I use).

If - as I planned - I switched from normal on/off cells to storing "effected cells" with the pattern that might be limiting the actual size of the pattern to much. We will see. And worry about that later (TM).

Another problem that could then arise is: my pattern generation alg. is pretty straight forward: concentrate on the bounding box of your pattern and just count all neighbors of all cells. Suitable for running block pushers and stuff for a few dozen generations. Less suited to tasks like collision with a 200x300 cell target (and running that for 10000+ generations).

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 10th, 2014, 9:01 am

oblique wrote:... the program has to store the result somewhere anyhow, adding another field doesn't look that expensive. Well, I could use some hash table or btree for loop detection, but as I said: the db engine I'm currently using is fast enough for now (plus we get results while the program is still running!).

As for size: even mysql can handle several 100 GB of data easily ...
And still be efficiant.
I suppose I'm less worried about the database's capabilities than about the available space on my not-very-new laptop's hard drive. Seems like if I end up running a search locally, there's still a potential tradeoff between storing more detail about each leaf, and being able to look deeper into the search tree.

Suppose that I'm fairly sure that for a particular type of search I'll never want to look twice at any reaction more than (say) half as big or half as slow as an R-pentomino explosion. Then it won't necessarily make sense for my copy of the search database to store RLE for every big pile of random ash -- not when it would be just as good to store a hash of the final result, plus the recipe to reproduce the reaction if I _do_ ever need it.

Paul Chapman's original Glue project needed quite a lot of time to get through even an incomplete scan of the search tree, stopping at nine slow gliders and limiting intermediate targets to the size of a pi explosion. It would be nice to have a way to look deeper into the tree, say somewhere between twelve and fifteen gliders, which might be possible if pi explosions can be forbidden. I suppose those don't really fit in width 31 anyway, though, so they won't cause the same kind of combinatorial explosion that they did in Glue searches.
oblique wrote:The only size limit I'm concerned of at the moment is that the rle string is currently limited to 1000 chars (maximum size of an index column with the database engine and version of mysql/mariadb I use).

If - as I planned - I switched from normal on/off cells to storing "effected cells" with the pattern that might be limiting the actual size of the pattern to much. We will see. And worry about that later (TM).
You mean, keep track of the reaction envelope in the same chunk of RLE, using a multistate rule like LifeHistory? Here are a couple of ideas for later (which you may have thought of already):

1) If you store the affected cells as a separate two-state pattern, as (for example) Hersrch's Herschel-track library does, then the RLE gets much shorter -- you're just storing long stretches of ON cells instead of speckles of different states.

2) There are ways to represent two-state patterns that take up a lot less space than RLE does. Koenig's Small Object Format might give you another factor of two or more.

3) Beyond that you might have to go to some kind of ugly unreadable binary compression, or possibly a structured pattern format like a one-line version of Xlife's #I syntax. Structured patterns would be a whole new set of headaches, but they would allow good compression, and also much better searchability than RLE or SOF -- you could actually do a text search on the database and find all the output patterns that include a block@(3, 5) AND a beehiveA@(8, 13).
oblique wrote:Another problem that could then arise is: my pattern generation alg. is pretty straight forward: concentrate on the bounding box of your pattern and just count all neighbors of all cells. Suitable for running block pushers and stuff for a few dozen generations. Less suited to tasks like collision with a 200x300 cell target (and running that for 10000+ generations).
Luckily I'm pretty sure you'd have to run this kind of search for much longer than the age of the universe, before you run out of even sub-50x50 patterns. But I think Paul Chapman's conclusion after lots of Glue benchmarking was that pattern generation was worth optimizing. He ended up putting together a C++ Rlife7.DLL that did a fair amount of clever bit-twiddling to run these kinds of smallish collisions very quickly.

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 10th, 2014, 2:21 pm

dvgrn wrote: I suppose I'm less worried about the database's capabilities than about the available space on my not-very-new laptop's hard drive. Seems like if I end up running a search locally, there's still a potential tradeoff between storing more detail about each leaf, and being able to look deeper into the search tree.

Suppose that I'm fairly sure that for a particular type of search I'll never want to look twice at any reaction more than (say) half as big or half as slow as an R-pentomino explosion. Then it won't necessarily make sense for my copy of the search database to store RLE for every big pile of random ash -- not when it would be just as good to store a hash of the final result, plus the recipe to reproduce the reaction if I _do_ ever need it.

Paul Chapman's original Glue project needed quite a lot of time to get through even an incomplete scan of the search tree, stopping at nine slow gliders and limiting intermediate targets to the size of a pi explosion. It would be nice to have a way to look deeper into the tree, say somewhere between twelve and fifteen gliders, which might be possible if pi explosions can be forbidden. I suppose those don't really fit in width 31 anyway, though, so they won't cause the same kind of combinatorial explosion that they did in Glue searches.
First: you could alter the db-definition to allow max 100 chars for rle.
Second: I'll make storing "irrelevant" Results an option ...
dvgrn wrote: You mean, keep track of the reaction envelope in the same chunk of RLE, using a multistate rule like LifeHistory? Here are a couple of ideas for later (which you may have thought of already):

1) If you store the affected cells as a separate two-state pattern, as (for example) Hersrch's Herschel-track library does, then the RLE gets much shorter -- you're just storing long stretches of ON cells instead of speckles of different states.

2) There are ways to represent two-state patterns that take up a lot less space than RLE does. Koenig's Small Object Format might give you another factor of two or more.

3) Beyond that you might have to go to some kind of ugly unreadable binary compression, or possibly a structured pattern format like a one-line version of Xlife's #I syntax. Structured patterns would be a whole new set of headaches, but they would allow good compression, and also much better searchability than RLE or SOF -- you could actually do a text search on the database and find all the output patterns that include a block@(3, 5) AND a beehiveA@(8, 13).
I will look into theese tipps, tank you. Binary formats ./. database storage does not sound like fun, tho.
As for text search: I currently have a comment column in my "reaction" table which I planned to use exactly for this purpose. But you would not like to waste that space either, I fear.
dvgrn wrote: Luckily I'm pretty sure you'd have to run this kind of search for much longer than the age of the universe, before you run out of even sub-50x50 patterns. But I think Paul Chapman's conclusion after lots of Glue benchmarking was that pattern generation was worth optimizing. He ended up putting together a C++ Rlife7.DLL that did a fair amount of clever bit-twiddling to run these kinds of smallish collisions very quickly.
@Bit-Twiddling: my first self made life simulating program was an Atari ST assembler monster with comparably huge lookup tables which used allmost half the memory - and took the assembler nearly an hour to build. Well: atleast it was running at 2gens per second on a 320x400 or so cell screen (1 pix = one cell) :lol:
I thought about reimplementing this ... we'll see.

Pattern generation sure is worth optimizing. I'm just not sure that, for example, hash life would really be an improvement for short runs of small patterns. One-bit-per-cell methods on the other hand might be ...

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 11th, 2014, 9:29 am

oblique wrote:@Bit-Twiddling: my first self made life simulating program was an Atari ST assembler monster with comparably huge lookup tables which used allmost half the memory - and took the assembler nearly an hour to build.
Sounds familiar -- my only venture into assembly language was also a Life simulator, on a similarly ancient computer with a clock speed measured in kilohertz.
oblique wrote:Pattern generation sure is worth optimizing. I'm just not sure that, for example, hash life would really be an improvement for short runs of small patterns. One-bit-per-cell methods on the other hand might be ...
In past experiments along these lines, Hashlife has been significantly slower than, e.g., QuickLife -- not surprisingly. Too much chaos, and nothing much ever repeats... until the pattern settles and gliders are flying away and so forth, but by that point you'll want to have noticed the new predictability and stopped the simulation anyway.

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 11th, 2014, 6:34 pm

Hmm, another idea for storing stable B2/S23 patterns:
(This looks more complicated then it really is, I fear)

+ use the 95 non-control ASCII-chars (should be almost univerally usable).
+ one pattern per line (EOL = end-of-pattern)
+ encode numbers n in the following way:
+ n<64=> n=0 -> SPACE (ASCII 0x20) ... n=63 -> _ (ASCII 0x5F) using ascii code 0x20 + n
+ n-64 < 28*95: devide (n-64) by 95, represent the quotient by a char between '`' (ASCII 0x60) and 'z' (ASCII 0x75)
and the reminder by a char between SPACE (0x20) and '~' (0x7E)
+ If n >= 2724 (i.e.: won't fit the representation above) encode like: '|', number-of-chars-needed, sequence-of-chars
that would be enough for like n = 95^95 ...
+ define a fixed, carved-in-stone enummeration of subpatterns. Say: a version of the list of common ash particles.
+ each oscillator and each not-perfectly-symmetrically pattern gets counted in each phase and orientation.
=> the first 64 symbols get us beyond the top-ten (even if we have to count the blinker twice and the boat 4 times!)
+ if we end up with an unknown subpattern, enclose it's rle with '{' and '}'
+ split the pattern into one or more subpattern, ordered by x-position, then y-position of the top left cell.
+ For each subpattern store: "x-offset" "y-offset" "subpattern".
+ If the new x is smaller, then insert an ~ and reset x-position.

Thus, if the *** phase of the blinker has index 1, the following pattern:

***
...
...
...
...
***

would be encoded as: SP SP '!' SP '$' '!'

and (if block has index 3):

....**
....**
......
......
......
**....
**....

would be: "$ #~ %#"

The main problem with this method would be that it relies on a set-in-stone enummeration of common subpatterns.

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 11th, 2014, 8:12 pm

oblique wrote:Hmm, another idea for storing stable B2/S23 patterns:
[snip scary-looking but straightforward list of ASCII translation tricks...]
Thus, if the *** phase of the blinker has index 1, the following pattern:

***
...
...
...
...
***

would be encoded as: SP SP '!' SP '$' '!' ...
Well, as long as you have good subroutines to translate to and from this format so that nobody but a computer ever has to try and read it, it's certainly your project so I shouldn't complain! I think I'd be a little worried about the detail of encoding zeroes as spaces -- it makes the format not very human-readable, and also not very human-emailable or human-copyable-and-pastable. If you lose one of the two spaces at the beginning, don't you get crazy frame-shift errors?

Just don't post your encoding to the forums without

Code: Select all

 tags around it, I suppose.

Have you had a good look at Koenig's Small Object Format, by the way?  No whitespace headaches, and I liked the way it handles numbers, with 0 = zero through 9 = nine, and then on up the ASCII table from there.  Small patterns often have short enough chunks of ON and OFF cells that the SOF is perfectly human-readable.  Just for comparison, your first pattern would be "3+53", and the blocks one would be "0342-0342+42-2".

[quote="oblique"]The main problem with this method would be that it relies on a set-in-stone enummeration of common subpatterns.[/quote]
That doesn't seem like such a terrible thing.  The search is bound to turn up hundreds of different subpatterns, if you count all phases and orientations.  Why not have a way of appending new unknown objects to the master list, so that they become known objects?  I suppose it could eventually be a communication problem if multiple people are running different searches, and adding different objects to their master lists.

But I definitely like the idea of storing target patterns as sequences of named (or indexed) subpatterns.  Does that mean you're planning to go through each settled output pattern and figure out all of its constituent parts -- as opposed to storing it as an undifferentiated chunk of RLE, and just catching just gliders and *WSSes around the edges?

[Assuming no loafers turn up, of course -- that would be nice, but very unlikely. But switch engines should turn up occasionally.]

I've written quick-and-dirty recognizer code a couple of times, and it keeps being more complicated than I think it should be.  My last script could tell that the ON phase of a beacon was one object, but the OFF phase looked like two objects.

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 11th, 2014, 9:03 pm

Re: pattern generation, in case this seems worth investigating:

Paul Chapman sent me the source code for his RLife8.DLL today. This is the C++ library he used in his Glue and DOpSearch search utilities. It does enough bit-twiddling to be significantly faster than your average one-byte-per-cell neighbor-counting code -- and there are some subroutines in there to handle reaction envelopes, too. Let's see: the "smudge" and "smear" are variants of the "affected cells", or the blue cells in LifeHistory. But now I don't remember which was which...!

The only documentation is the source code, I'm afraid, plus some usage examples in the DOpSearch search utility posted on the Geminoid Challenge thread.
Attachments
Rlife8.zip
Paul Chapman's RLife8.DLL and source code
(53.28 KiB) Downloaded 626 times

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 12th, 2014, 2:21 am

dvgrn wrote: Well, as long as you have good subroutines to translate to and from this format so that nobody but a computer ever has to try and read it, it's certainly your project so I shouldn't complain! I think I'd be a little worried about the detail of encoding zeroes as spaces -- it makes the format not very human-readable, and also not very human-emailable or human-copyable-and-pastable. If you lose one of the two spaces at the beginning, don't you get crazy frame-shift errors?

Just don't post your encoding to the forums without

Code: Select all

 tags around it, I suppose.
[/quote]
Loosing spaces would be very bad, yes. The situation could be improved if we'd allow a leading ~ at the start of the pattern or something. Even then the format is not really meant for us humans, I fear.
Maybe I leave out SP and stick to printable chars with a "visual identity" ;)

[quote="dvgrn"]
Have you had a good look at Koenig's Small Object Format, by the way?  No whitespace headaches, and I liked the way it handles numbers, with 0 = zero through 9 = nine, and then on up the ASCII table from there.  Small patterns often have short enough chunks of ON and OFF cells that the SOF is perfectly human-readable.  Just for comparison, your first pattern would be "3+53", and the blocks one would be "0342-0342+42-2".
[/quote]
I have look at it. I don't know it by heart, yet ;)

I just wanted a format that was as short as possible. And contain information about subpatterns / islands.

[quote="dvgrn"]
[quote="oblique"]The main problem with this method would be that it relies on a set-in-stone enummeration of common subpatterns.[/quote]
That doesn't seem like such a terrible thing.  The search is bound to turn up hundreds of different subpatterns, if you count all phases and orientations.  Why not have a way of appending new unknown objects to the master list, so that they become known objects?  I suppose it could eventually be a communication problem if multiple people are running different searches, and adding different objects to their master lists.
[/quote]
The communication is the real problem here. If you have to send your subpattern catalogue with your pattern, the efficiency is reversed. If you don't we'll have some 500 dialects by next christmas.

That's the idea behind the "embedded rle" kludge.

Another problem is efficiancy: matching some hundred or so subpatterns to the result is one thing. Doing this with thousands or more incresingly unlikely candidates might be another thing.

Well. Maybe not, if I make sure that all blocks and blinkers are found-and-removed before the program starts looking for 17bit still lifes ;)

[quote="dvgrn"]
But I definitely like the idea of storing target patterns as sequences of named (or indexed) subpatterns.  Does that mean you're planning to go through each settled output pattern and figure out all of its constituent parts -- as opposed to storing it as an undifferentiated chunk of RLE, and just catching just gliders and *WSSes around the edges?
[/quote]
I'm at least tempted to analyse every settled pattern into its parts.
Might make the results more usefull ...

[quote="dvgrn"]
[Assuming no loafers turn up, of course -- that would be nice, but very unlikely. But switch engines should turn up occasionally.]
[/quote]
Non-standard-space ships or switch engines would be counted as "pattern explodes" I fear.

[quote="dvgrn"]
I've written quick-and-dirty recognizer code a couple of times, and it keeps being more complicated than I think it should be.  My last script could tell that the ON phase of a beacon was one object, but the OFF phase looked like two objects.[/quote]
Hmm. As long as a preblock is not considered an interessting subpattern, my current code should find that phase.

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 12th, 2014, 9:52 am

oblique wrote:Losing spaces would be very bad, yes. The situation could be improved if we'd allow a leading ~ at the start of the pattern or something. Even then the format is not really meant for us humans, I fear.
Maybe I leave out SP and stick to printable chars with a "visual identity" ;)
At various times in my dubious programming career, I've had significant amounts of unexpected trouble with using backslashes, carets, quote marks, @ signs, and probably a couple of other characters as parts of encoded data strings. Mostly in the later stages, when it was too late to change the over-optimized data format... You're coding in C++ and storing stuff in a mySQL/MariaDB database; it might be worth avoiding backslashes and quote marks, anyway?

In the past few years I've gotten a lot of use out of plain old text files plus a regular-expression search function, for one-off searches where I'd rather not have to write out long SQL statements. Encoding subpattern names and offsets in either SOF or oblique-format would make that kind of ad-hoc search pretty awkward -- translate the numbers you want according to an arbitrary character table, then add escape characters for the various special characters that mean something to a regexp engine.
oblique wrote:I just wanted a format that was as short as possible. And contain information about subpatterns / islands.
Maybe that could be modified to "as short as reasonably possible"... For example, pattern indexes could be base-52, A-Z|a-z, and offset numbers could be written out in boring old base 10 decimal. That would have the advantage of being trivial to parse, resistant to frame-shift errors, and readable and writable by mere humans without special translation tools.
oblique wrote:If you have to send your subpattern catalogue with your pattern, the efficiency is reversed. If you don't we'll have some 500 dialects by next christmas.

That's the idea behind the "embedded rle" kludge.
Yup, there's definitely a tradeoff there, and the embedded-rle idea is a good solution. Might be worth building a fairly comprehensive subpattern library early on, before you officially let the project out the door, so that the only embedded patterns are the interesting one-in-a-million cases. As you said, you don't lose much efficiency as long as the library is sorted by increasing unlikeliness.
oblique wrote:
dvgrn wrote: [Assuming no loafers turn up, of course -- that would be nice, but very unlikely. But switch engines should turn up occasionally.]
Non-standard-space ships or switch engines would be counted as "pattern explodes" I fear.
This is another case where there might be an optional slower-search approach -- follow all patterns out until they stabilize, or to some methuselah-sized number like 20K ticks. Anything that is still going past the ridiculously high limit is a really good methuselah, or more likely a switch engine, or just possibly some kind of spaceship that the recognizer code doesn't know about -- two *WSSes really close together, maybe?

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 12th, 2014, 2:38 pm

dvgrn wrote: At various times in my dubious programming career, I've had significant amounts of unexpected trouble with using backslashes, carets, quote marks, @ signs, and probably a couple of other characters as parts of encoded data strings. Mostly in the later stages, when it was too late to change the over-optimized data format... You're coding in C++ and storing stuff in a mySQL/MariaDB database; it might be worth avoiding backslashes and quote marks, anyway?
You got a point there ...
dvgrn wrote: In the past few years I've gotten a lot of use out of plain old text files plus a regular-expression search function, for one-off searches where I'd rather not have to write out long SQL statements. Encoding subpattern names and offsets in either SOF or oblique-format would make that kind of ad-hoc search pretty awkward -- translate the numbers you want according to an arbitrary character table, then add escape characters for the various special characters that mean something to a regexp engine.
You are the one, who wanted to safe some space ;)
Anyway: i'll stick to rle until the rest of the program is more-or-less usable. Changing this is not too tricky.
dvgrn wrote: Maybe that could be modified to "as short as reasonably possible"... For example, pattern indexes could be base-52, A-Z|a-z, and offset numbers could be written out in boring old base 10 decimal. That would have the advantage of being trivial to parse, resistant to frame-shift errors, and readable and writable by mere humans without special translation tools.
I though about base-64 encoding of numbers (well not strictly, no aligning = at the or something). That is a well tested encoding which should be usable almost anywhere ... still thinking ...
dvgrn wrote:
oblique wrote:If you have to send your subpattern catalogue with your pattern, the efficiency is reversed. If you don't we'll have some 500 dialects by next christmas.

That's the idea behind the "embedded rle" kludge.
Yup, there's definitely a tradeoff there, and the embedded-rle idea is a good solution. Might be worth building a fairly comprehensive subpattern library early on, before you officially let the project out the door, so that the only embedded patterns are the interesting one-in-a-million cases. As you said, you don't lose much efficiency as long as the library is sorted by increasing unlikeliness.
I would take the some pattern survey collection like then one previously linked here a few days before. It's sorted by likelyness, so it should be safe to cut about anywhere ... (of course the one-ina-trillion cases won't be perfectly sorted ... but ...)
dvgrn wrote:
oblique wrote:
dvgrn wrote: [Assuming no loafers turn up, of course -- that would be nice, but very unlikely. But switch engines should turn up occasionally.]
Non-standard-space ships or switch engines would be counted as "pattern explodes" I fear.
This is another case where there might be an optional slower-search approach -- follow all patterns out until they stabilize, or to some methuselah-sized number like 20K ticks. Anything that is still going past the ridiculously high limit is a really good methuselah, or more likely a switch engine, or just possibly some kind of spaceship that the recognizer code doesn't know about -- two *WSSes really close together, maybe?
Most space ships would be easily detectable if you keep runing the pattern ... non-space-ship patterns tend to stay where they are or to grow in one or more directions. Space ships move and there bounding box thus moves with them (and stays more-or-less constant).

That's not exactly the current focus of my program, but since it records the exploding / not stabilizing reactions as such, it is no problem to revisit them later (may be exporting those pattern and run them with something like bgolly - or an methuselah search back end).

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 25th, 2014, 2:23 pm

Update: the version of my search utility published here is now "somehow usable".

I'm currently running it for some day and a few hours. It's reached a maximum depth of 20 and it has found over 70 lwss and mwss seeds already - alas none of them seems to be of any real use (no edge shooters, no clean seed ...)
(BTW: the search has eaten some 4GB of my hard disk by now storing about 13 million of handled collisions and 15? mio of resulting targets).

The main problems at the moment are:
+ memory hunger: I currently queue all resulting collisions based on given pattern sorted by "cost", as a result the memory fills with - litterally - millions and millions of collisions, most of which are not handle for hours or even days.
+ As a consequence you have to estimate the maximum depth you will be able to vistit, or you'll run out of memory quickly. (my search is at 3.2 Gigs of memory at 80 bytes per unhandled reaction).
+ The program is only able to deal with ONE rake (for starters I use 1RL0).
+ Speed of course.

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 25th, 2014, 2:45 pm

oblique wrote:Update: the version of my search utility published here is now "somehow usable".
Ah, good! I was just wondering this morning how the "new Glue" was coming along...
oblique wrote:The main problems at the moment are:
+ memory hunger: I currently queue all resulting collisions based on given pattern sorted by "cost", as a result the memory fills with - litterally - millions and millions of collisions, most of which are not handle for hours or even days.
+ As a consequence you have to estimate the maximum depth you will be able to visit, or you'll run out of memory quickly. (my search is at 3.2 Gigs of memory at 80 bytes per unhandled reaction).
Sounds fairly inevitable for a search tree with this kind of branching factor. I guess you could write big chunks of the unhandled-reaction list to disk, and pull them back in when you need them, a chunk at a time.

Is there any reasonable way to distribute the search? You search Branch A, I search Branch B, and a bunch of other people try Branches C through Z, and then pool the results? I guess you'd end up re-searching a lot of nodes that can be reached in many different ways.

HartmutHolzwart
Posts: 840
Joined: June 27th, 2009, 10:58 am
Location: Germany

Re: Prototype / demo for a (weighted) collision search program

Post by HartmutHolzwart » February 25th, 2014, 4:57 pm

could we nevertheless see those seeds?

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 2:00 am

HartmutHolzwart wrote:could we nevertheless see those seeds?
Sure - I'll make a stamp-collection (sometime today)

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 2:16 am

dvgrn wrote:
oblique wrote:The main problems at the moment are:
+ memory hunger: I currently queue all resulting collisions based on given pattern sorted by "cost", as a result the memory fills with - litterally - millions and millions of collisions, most of which are not handle for hours or even days.
+ As a consequence you have to estimate the maximum depth you will be able to visit, or you'll run out of memory quickly. (my search is at 3.2 Gigs of memory at 80 bytes per unhandled reaction).
Sounds fairly inevitable for a search tree with this kind of branching factor. I guess you could write big chunks of the unhandled-reaction list to disk, and pull them back in when you need them, a chunk at a time.
Thought about that (or storing unfinished reactions to the db) myself. Just not there yet. For the next step i'd rather change the way I'm handling the enumeration: instead of building all new "twigs" starting at a new node immediately after the node is calculated and then store them for ages, i would rather keep the nodes and generate new twigs in the order they will be checked. (For the current search there are 12 to 31 one possible lanes and thus up to 31 twigs per node).

The dump to disk/db stuff is on my list nonetheless. Right now I have no way of splitting a search or resuming it after it was aborted (or "finished" at a predefined depth).
dvgrn wrote: Is there any reasonable way to distribute the search? You search Branch A, I search Branch B, and a bunch of other people try Branches C through Z, and then pool the results? I guess you'd end up re-searching a lot of nodes that can be reached in many different ways.
That is the problem, yes. I have no idea how to handle that efficiently - safe setting up a central database, downloading "chunks" of reactions from there and storing back the intermediate results.
There are a few problems with that, starting with the resources we have available (upload bandwith is a problem for me, for starters) and the central db would be a mayor bottle neck (in my current search the program itself does not satuate the CPU, so the *local* database engine is a kind of bottleneck already ...)

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 2:24 am

Another thing:

I'm thinking about implementing other rakes.

Something like:
+ keep track of the "phase" of the trail ./. current construction site.
+ define rephasers and rakes in terms like "rephases by X [and emits bullet on Y]"
+ enummerate lanes based on this informations

Mixing fwd rakes into this will be a pain I fear. This would mean to not only calculate a "lane offset" - we'd need a two dimensional offset (modulo 31 probably)

User avatar
dvgrn
Moderator
Posts: 10610
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Prototype / demo for a (weighted) collision search program

Post by dvgrn » February 26th, 2014, 9:08 am

oblique wrote:Mixing fwd rakes into this will be a pain I fear. This would mean to not only calculate a "lane offset" - we'd need a two dimensional offset (modulo 31 probably)
Yup, this is a tricky one -- though mostly it's just math, so with any luck it will only be painful once. If we build, let's say, all known pairs of rakes, and figure out how to explain all those output lanes and weightings to SSCS, for each possible target distance... then hopefully it will all Just Work after that.

Combining forward and backward rakes will make searches a good bit more difficult to run and to interpret, unfortunately. If it's just backward rakes, or just forward rakes, then the distance to the target doesn't matter. [Sorry, just thinking out loud here.] But if there are gliders of both types -- even just slow gliders -- then there are 31 nontrivially different locations for each target. A block at 0,0 can't be hit by a given forerake/backrake combination in the same way as a block at (1,1); one rake will hit it in the same way, but the lane for the other rake will be shifted by (X+Y).

-- I'm guessing you won't want to worry about supporting the really complicated and prolific part of the "real" search tree, which is the modification of the target by slow pairs of gliders from forerake/backrake combinations with different relative timings (!). That would really send the branching factor out of control, but of course it would probably allow upship-seed recipes with a much lower total R-value.

The other thing that might be worth thinking about at this stage is initial targets. Rephasers can be trivially adjusted to produce anything on List 1 or List 2, but only in certain columns relative to the block trails. Those seem like particularly good targets to start searches from... On the other hand, I suppose we can only really get one or two cheap upship streams that way; any others will probably be out of reach of those key columns. So probably the cheapest starting target will be a Heisenburp (for HWSSes) or a custom forerake/backrake pair for other types of upships.

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 1:42 pm

oblique wrote:
HartmutHolzwart wrote:could we nevertheless see those seeds?
Sure - I'll make a stamp-collection (sometime today)
See attachment
Attachments
sample.col.gz
(5.34 KiB) Downloaded 594 times

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 1:53 pm

I've killed my search today morning, since the machine it ran on became ... less responsive.

I have played a bit with the resulting data ...

Attached are some queries I ran and my comments to their results.

Also attached is a pattern which demonstrates multiple fly-bys ;)
Attachments
firstrun.txt
Some queries ...
(17.35 KiB) Downloaded 554 times
butterfly.rle.txt
one bullet flying by/thru 4 copies of one pattern ... (rename to .rle)
(954 Bytes) Downloaded 572 times

oblique
Posts: 122
Joined: July 16th, 2013, 1:30 pm

Re: Prototype / demo for a (weighted) collision search program

Post by oblique » February 26th, 2014, 2:08 pm

dvgrn wrote:-- I'm guessing you won't want to worry about supporting the really complicated and prolific part of the "real" search tree, which is the modification of the target by slow pairs of gliders from forerake/backrake combinations with different relative timings (!). That would really send the branching factor out of control, but of course it would probably allow upship-seed recipes with a much lower total R-value.
Well, guess what :D

My intention was to add *single* fwd rakes to my search, so we collide a given target with either a fwd or back rake (one at a time).
dvgrn wrote:The other thing that might be worth thinking about at this stage is initial targets. Rephasers can be trivially adjusted to produce anything on List 1 or List 2, but only in certain columns relative to the block trails. Those seem like particularly good targets to start searches from... On the other hand, I suppose we can only really get one or two cheap upship streams that way; any others will probably be out of reach of those key columns. So probably the cheapest starting target will be a Heisenburp (for HWSSes) or a custom forerake/backrake pair for other types of upships.
Hmm. Choosing initial targets is easy. Just put them in a file together (my program should be able to load rle, gencol results and the traditional ***.*** notation without problems - just leave empty lines between to rle or ***.*** patterns).

I didn't quite get which orientations of - say - the pi mess should be included ...

Post Reply