Finer points of file formats

For general discussion about Conway's Game of Life.
Post Reply
User avatar
jwodder
Posts: 1
Joined: December 24th, 2023, 12:48 pm
Contact:

Finer points of file formats

Post by jwodder » December 24th, 2023, 1:50 pm

I'm writing a Rust library for doing Game of Life stuff, and, in implementing support for the Plaintext and Run Length Encoded file formats, I've encountered some issues with the descriptions on the wiki. Some of these are just areas where the descriptions could be more exact, but there are also some parts of the descriptions that suggest that some or many of the pattern files on conwaylife.com are invalid.

For the plaintext format:
  • The format description states that the first line of the file is supposed to be of the form "!Name: Something", but 2540 out of the 3210 .cells files in all.zip lack this line entirely. Is the "!Name" line not actually required?
  • Are comments permitted in the middle of a pattern drawing, e.g.:

    Code: Select all

    !Name: Glider
    .O.
    ..O
    !Ooo, spooky!
    OOO
    
  • Is it permissible for the name header to be of the form "!Name:Something", without any whitespace after the colon?
  • Must the space after "!Name:" be the space character (U+0020), or can it be a tab or Unicode whitespace?
  • If "!Name:" is followed by more than one space (and/or tab or Unicode whitespace?), are the spaces after the first stripped or included in the name?
  • What counts as a newline sequence in this format? The RLE page states that "DOS [i.e., CR LF], Unix [i.e., LF] and Mac [i.e., CR] newline conventions are all acceptable", but the Plaintext page is silent on this.
  • When outputting the 0×0 pattern (xs0_0 in Catagolue) in Plaintext format, is there a consensus on whether the drawing should consist of a single blank line or just be empty/absent?
For the RLE format:
  • Are blank lines allowed between the '#' lines and the header? The fireshiprake.rle and tubwithnine.rle files do this.
  • Is leading whitespace allowed on lines? The pseudobarberpole_synth.rle file does this.
  • What characters are allowed as the "type" letter after the '#' in a '#' line? Must "type" letters always be ASCII alphabetic characters? Are other ASCII letters and/or Unicode alphabetic characters allowed?
  • Same questions as above about whitespace after "!Name:" but for the whitespace after "type" letters
I realize there's almost certainly no "official" standard to refer to for resolving my questions and that the format definitions are likely driven more by what major implementations emit & accept, but hopefully there's something approaching a consensus on the questions above. I don't want to just implement the formats whatever way I want in my own code, as that's the sort of thing that lead to the hellscape of browser-specific HTML back in the day.

User avatar
dvgrn
Moderator
Posts: 10695
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: Finer points of file formats

Post by dvgrn » December 24th, 2023, 4:06 pm

jwodder wrote:
December 24th, 2023, 1:50 pm
I realize there's almost certainly no "official" standard to refer to for resolving my questions and that the format definitions are likely driven more by what major implementations emit & accept, but hopefully there's something approaching a consensus on the questions above. I don't want to just implement the formats whatever way I want in my own code, as that's the sort of thing that lead to the hellscape of browser-specific HTML back in the day.
Some of the history behind the "plaintext" format can be found by following the link in the Plaintext article. There was some hope of getting things standardized in 2020, but I don't recall that I ever got a collection of updated .cells files to upload.

The part of the Plaintext article that mentions that other variants "should be corrected" was added by GUYTU6J in 2021, in reference to HubTou's project. The idea is that it makes sense to standardize .cells files as they're provided on the LifeWiki in all.zip -- not that there's anything really wrong with the dozens of other ASCII formats that have been used elsewhere.

There are lots and lots of ASCII-format patterns in various collections that don't have a first-line "Name" header in the exact format that the article specifies, for example. Usually if you're just trying to display the pattern, the simplest thing to do is ignore any header lines completely. If your program can recognize a first-line format and confidently extract a name, that's fine of course; otherwise the filename could probably be used as a substitute name, or if that doesn't work then the pattern can just be called "noname" or "unknown".

I can try to answer the more detailed questions if it still seems like those answers would be useful, but generally weird whitespace characters can very safely be converted into good old ASCII 32 -- and then mostly they'll be ignored anyway. Newline characters can be similarly standardized; there won't be any useful information hiding in newline encodings that have emanated from different OSes.

For some of the RLE format questions, especially the part about the "type" character following the "#", there's a table in the RLE article. If you're unlucky, you'll trip over the occasional RLE file where just "#" was used with no "#C" or any other type character; Golly supports this option, I believe. There are a lot more type characters that were used sometimes long long ago -- like, back in the previous millennium -- but they show up so rarely nowadays that it's probably better to let them languish in well-deserved obscurity than to give them any new attention.

Post Reply