I have a new proposal for a compacted 3-state R1OT notation — honestly it's more of an evolved version of some of the things other people have already posted, the main design goal is just to be at least somewhat human-readable and -writable as well as reasonably compact for a lot of the prominent rules that have already been studied.
So here's an example rulestring: Bx1bc3acd8yd1e/C*0a3cx5-c/S*1bcd2bc3b
B indicates birth and S indicates survival as usual, and C indicates a change in state... x and y refer to the final state of the cell (1 or 2, respectively), so for example Bx indicates off cells becoming state 1 and Cy indicates state 1 cells becoming state 2. (Originally I wanted to use roman numerals i and ii, but that would create conflicts later on, it turns out...)
For each x transition, you have a bunch of segments that usually look a lot like Hensel notation of the form #___, where the number # is the number of state 1 cells and each letter _ is the number of state 2 cells (starting from a = 0, b = 1, c = 2, etc.). For y transitions numbers are state 2 and letters are state 1 — so the numbers are neighbors that are alike to the final state and the letters are neighbors that are the opposite live state from the final state.
Unlike Hensel notation, though, you can actually switch the numbers and letters if you want! So Bxa345 is an equally valid way of writing Bx3a4a5a (both representing birth of a state 1 cell on 3–5 state-1 neighbors and no state-2 neighbors, for instance — you just can't use both versions in the same segment. For example, if you wanted to specify that state 2 survives if it has either exactly one state-1 neighbor or exactly one state-2 neighbor but not both, you'd have to write Sy1-aya-1 (repeating the y).
Transitions are grouped at the smallest level in these Hensel-like units, then the final state (x or y), then type of transition (birth, state change, or survival) at the largest level (separated by / as usual).
Some more shortcuts exist:
- Similar to INT rules one can use dashes to invert something (i.e. "all but ___"), but this time you can invert it at different levels or even multiple at once (Bx1-c, Bxc-1, Bx-3b, Bx-3, Bx-3-b, Bx-b, etc.).
- Plain Sx or Sy (with nothing after it) mean every cell of that state survives, or similar for C transitions.
- Things like Sx#23 (the hash mark in particular) mean those transitions are fully totalistic based on live neighbors, with no distinction between state-1 and state-2 neighbors. (Normal transitions can't follow # transitions within the same x/y block, so you could either write "state 2 becomes state 1 on two or three live neighbors, or four of each state", as Cx4e#23 or Cx#23x4e, but not as Cx#234e.)
- B*3 or the like means * is standing in for both of x or y because both types of transitions are written the same — so something like B*2-cd5ad would be equivalent to Bx2-cd5ady2-cd5ad.
So for the example (Bx1bc3acd8yd1e/C*0a3cx5-c/S*1bcd2bc3b), you have:
- Birth of state 1 on ((one state 1 cell) with (one or two state 2 cells)), ((three state 1 cells) with (zero, two, or three state 2 cells)), or eight state 1 cells; and birth of state 2 on ((one state 2 cell) and (three state 1 cells)) or four state 1 cells (regardless of state 2 cells);
- If a cell of one state has zero live neighbors, or ((exactly two same-state neighbors) and (exactly three opposite-state neighbors)), or it is state 1 and has ((exactly five state-2 neighbors) and (anything but exactly two state-1 neighbors)) it becomes the other live state;
- And if a cell has 1–3 of each type of neighbor and no more than 4 total live neighbors, it survives.
(I haven't actually looked at this rule at all, it's just an example of what a rulestring looks like.)
Some simpler examples with more well-known rules:
- Immigration: B*2b3a/C/S*#23 — birth on three like neighbors or two like and one unalike; survival on two or three neighbors of any kind
- Brian's Brain: Bx2/Cy/S — birth of state 1 on two state-1 neighbors (regardless of state-2 neighbors); state 1 always becomes state 2, which always dies
- Symbiosis: B*3a/C/S*-a-23 — birth on three like neighbors and no unalike neighbors; always survive unless no unalike neighbors... unless two or three like neighbors (double inversion)
- VN-B1x2S23: Bya/Cx1/Sx23 — birth of state 2 on one state-1 neighbor, state 2 becomes 1 on one state-1 neighbor, state 1 survives on two or three state-1 neighbors, all regardless of state-2 neighbors
- MilhinSA: Bx0d1c/Cy/Sy1b2ab3a — birth of state 1 on three state-2 neighbors and no state-1 or two state-2 and one state-1; state 1 always becomes state 2; state 2 survives if two or three neighbors and at most one of those is state 1
- MixedLifeSeeds: Bx2b3ay2a/C/Sx23 — birth of state 1 on three state-1 neighbors and no state-1 or two state-1 and one state-1; birth of state 2 on two state-2 neighbors and no state-1; state 1 survives on two or three state-1 neighbors, regardless of state-2 neighbors
- Neutronium: Bx#3/Cy#8/Sx#23y — state 1 born on three live neighbors of any type, state 1 becomes state 2 on 8 live neighbors, state 1 survives on 2 live neighbors, state 2 always survives
Advantages of this notation:
- At least for many commonly-studied rules it's probably the most compact 3-state OT rulestring notation that's been developed
- Very good at conveying rules with certain types of state symmetries (including some well-studied ones like Generations and variants)
- IMO it's easier to read than most other notations I've seen (still not totally easy, but I can't imagine it being possible to make something too much easier than this)
- A lot of the syntax is fairly familiar, even if it has different meanings sometimes
- Separators! Heterogeneity! Most of the time you can actually look at the middle of a rulestring and figure out what's going on, instead of having to go back to the beginning and read it all the way through to figure out which numbers/letters in the long string of uninterrupted decimal, hex, or base-64 digits mean what
- It could also be a powerful tool for constructing rules (just like Hensel notation is for INT rulegolfing), since it breaks down the types of sets of transitions that can occur in 3-state OT in a reasonably intuitive framework
Disadvantages of this notation:
- Still not totally easy to read, even though it's better than most
- Not compatible with a lot of existing tools — case-sensitive (rip Catagolue), special characters, C is a bit of a conflict with Generations syntax
- Hard to canonize algorithmically — you can define the canonical rulestring as the lexicographically earliest one at the shortest possible length, but computing what it is for a given rule seems very challenging in the general case
- Doesn't enforce internal consistency — you can define a given transition to exist and not exist at once, or you can define a cell to become two different states at once, in several different ways without being (locally) syntactically invalid
- In a few cases (like long strings of a0b1c2d3e4...) it still has the same garden path problem as a lot of the notations that lack separators entirely
- Difficult to extend to more states (since there aren't any more obvious ordered sequences of characters to use after numbers and letters)
- Very bad at conveying rules with certain other types of state symmetries (like plus minus rules, for instance)
- Not supported by any program yet, or even a rule table generating script, since I'm too lazy and hate string manipulation too much to write and debug one
I probably made multiple mistakes somewhere in here, so please don't hesitate to ask if something doesn't seem right or if you're just confused for whatever reason, like I didn't explain well, which I probably didn't... (Also I'd be remiss if I didn't link
xkcd 927 to call myself out.)