apgcode
This article needs to be expanded with: information about linear growth apgcodes. |
apgsearch and its search results database, Catagolue, use apgcodes to classify and denote patterns.
Contents
Encoding objects
apgcodes consist of a prefix and a suffix, separated by an underscore; both the prefix and the suffix are alphanumeric strings. The prefix further consists of a two-character type and a number. The types are:
- xs for a still life;
- xp for an oscillator;
- xq for a spaceship;
- yl for a periodic linearly growing object, such as a puffer or gun; and
- methuselah for a methuselah.
- messless for a diehard.
The number following the type represents the population for a still life, the lifespan for a methuselah or diehard, or otherwise the period of the object.
In codes for xs, xp, and xq, the suffix following the underscore is a representation of the object in Extended Wechsler Format (see below); in codes for yl, the suffix consists of additional information encoding higher-order periods of the pattern.
Unencodable objects
In their original form, apgcodes are not guaranteed to encode still lifes, oscillators and spaceships exceeding a 40×40 bounding box.^{[note 1]} These are instead reported by apgsearch and censused by Catagolue as oversized patterns, using the following types:
- ov_s for an oversized still life;
- ov_p for an oversized oscillator;
- ov_q for an oversized spaceship.
These types are then followed by a number representing the population for a still life, or otherwise the period of the object, as in the non-oversized case. apgcodes of this form do not uniquely identify a specific object, but rather represent classes of objects.
An extension to apgcodes, dubbed "greedy apgcodes" (see below), was later used to relax the 40×40 bounding box constraint, although some sufficiently large patterns remain canonically unencodable.
Unclassifiable objects
Objects which apgsearch cannot classify as above are denoted by one of:
- zz_EXPLOSIVE
- zz_LINEAR
- zz_QUADRATIC
- zz_REPLICATOR
- PATHOLOGICAL (e.g. high-period oscillators)
Chaotic-growth (zz_*) patterns are classified according to certain heuristics; the labels chosen do not necessarily reflect the nature of the object encountered, so an object classified as e.g. zz_REPLICATOR need not be an actual replicator.
Like oversized apgcodes, these apgcodes do not uniquely identify specific objects.
Extended Wechsler Format
Non-oversized still lifes, oscillators and patterns are encoded in extended Wechsler format, an extension of a pattern notation developed by Allan Wechsler in 1992.
- The pattern is separated into horizontal strips of five rows.
- Each strip, n columns wide, is encoded as a string of n characters in the set {0, 1, 2, ..., 8, 9, a, b, ..., v} denoting the five cells in a vertical column corresponding to the bitstrings {'00000', '00001', '00010', ..., '01000', '01001', '01010', '01011', ... '11111'}.
- The characters 'w' and 'x' are used to abbreviate '00' and '000', respectively, and the symbols {'y0', 'y1', y2', ..., 'yx', 'yy', 'yz'} are used to encode runs of between 4 and 39 consecutive '0's.
- Extraneous '0's at the ends of strips are not included in the encoding. (Note that in particular, blank five-row strips are represented by the empty string.)
- The character 'z' separates contiguous five-row strips.
Examples
xq4_27deee6 encodes a HWSS:
xs31_0ca178b96z69d1d96 encodes a 31-bit still life; note how the 'z' separates the two five-row strips:
xp30_w33z8kqrqk8zzzx33 encodes a trans-queen-bee-shuttle; note how five-row strips are represented by the empty string:
xp2_31a08zy0123cko is a quadpole on ship; note how trailing zeros in the first five-row strip are not encoded:
Canonical form
In order to enforce a canonical form, there are further rules regarding encoding:
- The leftmost column and uppermost row must each contain at least one live cell. (This gives a canonical position.)
- A canonical orientation and phase must be determined. For example, with the caterer (p3 oscillator with no symmetry), there are three phases and eight orientations, so we have 24 possible encodings. A total order on these encodings is defined as follows:
- Shorter representations are preferred to longer representations;
- For representations of the same length, lexicographical ASCII ordering is applied, and preference given to earlier strings.
This gives, for any still-life, oscillator or spaceship, an unambiguous canonical code to represent the pattern. It has several desirable properties:
- Compression: it is much more compact than RLE or SOF for storing very small patterns, and often even beats the common name ('xp15_4r4z4r4' is shorter than 'pentadecathlon')!
- Character set: it only uses digits, lowercase letters and the underscore, so can be safely used in filenames and URLs.
- Human-readability: the prefix means that we can instantly see whether a particular object is a still-life (and if so, what size), oscillator (and if so, what period) or spaceship (and if so, what period). It also means that the string is instantly recognised as being an encoding of an object ('xp2_7' is obviously a blinker, whereas the digit 7 on its own with no extra context is ambiguous).
Adapting apgcodes to larger patterns
As apgcodes were originally limited to patterns fitting into a 40×40 bounding box, different ways of extending them to allow unambiguously encoding larger patterns were proposed.^{[1]} In their original form, apgcodes are not guaranteed to be able to encode larger patterns, due a lack of well-definedness for runs of 40 or more consecutive zeroes, and apgsearch versions up to 3.x would report such patterns as oversized instead of attempting to encode them.
Extended ("greedy") apgcodes were eventually adopted to allow larger patterns to be encoded. These avoid the problem by stipulating that a run of n zeroes be encoded as follows:
- If n is less than 40, the run is encoded as in regular apgcodes, using the characters w, x, and y0 .. yz as appropriate.
- If n is equal to or greater than 40, the run is encoded as yz (representing the first 39 zeroes), following by the encoding for a run of n - 39 zeroes according to this definition.
That is to say:
This extension, which is used on the LifeWiki for larger patterns, retains compatibility with existing apgcode decoders. Adapting encoders to produce extended apgcodes is similarly straightforward.
Greedy apgcodes were adopted for apgsearch 4.x^{[2]}, subject to the following constraints:
- (width + 2) * (height + 2) <= 10000;
- the resulting apgcode, without the prefix, cannot exceed 1280 bytes.
Adapting apgcodes to multistate rules
Work is ongoing on extending apgcodes to multistate (Generations) rules.^{[3]} For this, apgcodes encoding still lifes, oscillators and spaceships will contain multiple suffixes, separated by underscores.^{[note 2]}
Each suffix will encode one 'layer' of the pattern, with patterns in an n-state rule having 2+⌈log_{2}(n-2)⌉ layers, where layer 0 is 'live', layer 1 is 'refractory', and the remaining layers implement a binary counter.
More precisely, a pattern in an n-state Generations rule is represented as follows:^{[4]}^{[5]}
- "Dying" states are renumbered to 2, 6, 10, 14, 18, ..., in reverse chronological order.
- Each state is represented as a binary number.
- An extended Wechsler representation is generated for each bit position, starting with the least significant bit.
- The resulting representations, now called "layers" are concatenated with underscores to form a representation of the entire pattern.
To obtain the pattern's canonical object, representations are generated for each phase and each possible orientation of the pattern, and the canonical representation is chosen as described in the "Canonical form" section above. Finally, a suitable prefix describing the pattern's overall behavior is prepended.
In 3-state rules, this simplifies to the two layers directly representing cell states 1 and 2, respectively.
Examples
For example, the following spaceship in Brian's Brain (Generations rule /2/3, aka B2/S/G3) is represented by the apgcode xq4_3482h8a_03482lx8:
Catagolue: here (click above to open LifeViewer) |
As Brian's Brain is a 3-state rule, the two layers of the code directly represent cells in states 1 (live) and 2 (dying); note how x represents 000 in the second layer:
Putting both layers together yields the complete pattern:
For a more complex example, consider the following reflectorless rotating oscillator in an 11-state Generations version of Conway's Game of Life (23/3/11), xp32_ss2gzw11_wsfm76cg_x86668_wke2024_ws6i1_x1:
Catagolue: here (click above to open LifeViewer) |
In order to encode this pattern, states 2, 3, ..., 10, 11 are renumbered 34, 30, ..., 6, 2; six layers are then used to encode each bit of these new state numbers:
Putting the layers together yields the complete pattern:
Long-lived soups
Unlike with other patterns, apgcodes for methuselahs and diehards classify the soup itself rather than the individual objects that emerge from it, and do not convey any information other than the soup's lifespan.
Ordinary methuselahs are classified under the methuselah prefix with the suffix being the soup's lifespan with the last three digits truncated and the letter k to stand for "thousand." For example, 42100M was recorded by apgsearch as methuselah_42k, and is therefore collected by Catagolue with all other soups lasting between 42,000 and 42,999 generations.^{[note 3]}
Diehards, or long-lived soups which leave behind no ash, are classified similar to other methuselahs, but under the messless prefix. Only two digits are truncated for messless soups and replaced with the letter h to stand for "hundred."^{[note 4]}
Soups with large final populations are classified under the megasized prefix. Like with diehards, two digits are truncated from the population count.^{[note 5]}
Limitations
apgcodes can be ambiguous if a rule is neither specified nor implicitly assumed, with the same code being assigned to different patterns evolving differently. For example, the code xp2_2a54 describes at least three different period-2 oscillators in outer-totalistic rules which are all titled as "clock"; further variants exist in non-totalistic rules:
B3/S3 to B35678/S02345678 | B34/S to B345678/S0245678 | B4/S1 to B45678/S01245678 |
Also see
Notes
- ↑ Patterns exceeding a 40×40 bounding box can still be encoded using "classic" apgcodes if they match either fit within such a box in at least one phase, or if, alternatively, they do not contain runs of more than 39 zeroes in their extended Wechsler representation.
- ↑ That is to say, these apgcodes will match the following regular expression: x[spq][0-9]+(_[0-9a-z]+)*
- ↑ Only methuselahs lasting at least 25,000 generations are reported by apgsearch, and only by versions v4.54 or later. Additionally, apgsearch estimates the lifespan of each soup before testing it more precisely, and is not guaranteed to detect all methuselah_25k soups.
- ↑ Only diehards lasting at least 500 generations are reported by apgsearch, and only by versions v4.69 or later.
- ↑ Only soups with final populations of at least 3,000 are reported by apgsearch, and only by versions v5.03 or later.
References
- ↑ Extending apgcodes to larger patterns (discussion thread) at the ConwayLife.com forums
- ↑ Re: apgsearch v3.1 (discussion thread) at the ConwayLife.com forums
- ↑ Re: apgsearch v3.1 (discussion thread) at the ConwayLife.com forums
- ↑ Re: Catagolue browser extension (discussion thread) at the ConwayLife.com forums
- ↑ Re: Catagolue browser extension (discussion thread) at the ConwayLife.com forums