"Posts per page" URL parameter?

For discussion directly related to ConwayLife.com, such as requesting changes to how the forums or home page function.
Post Reply
MikeP
Posts: 105
Joined: February 7th, 2010, 9:51 am
Location: Ely, Cambridgeshire, UK

"Posts per page" URL parameter?

Post by MikeP » May 2nd, 2020, 8:02 pm

Is there an URL parameter I can set to display more posts on each page? Ideally, all of them.

I'd like to download the entire Oscillator Discussion Thread and try to extract all the oscillators automatically.

User avatar
dvgrn
Moderator
Posts: 10612
Joined: May 17th, 2009, 11:00 pm
Location: Madison, WI
Contact:

Re: "Posts per page" URL parameter?

Post by dvgrn » May 2nd, 2020, 9:25 pm

MikeP wrote:
May 2nd, 2020, 8:02 pm
Is there an URL parameter I can set to display more posts on each page? Ideally, all of them.

I'd like to download the entire Oscillator Discussion Thread and try to extract all the oscillators automatically.
I don't know of one. However, here's some code on GitHub that's pretty easy to adapt to collect HTML for all the posts.

Here are the contents of all the code blocks in the Oscillator Discussion Thread
patterns-from-Osc-Discussion.zip
everything that looked like an oscillator pattern on 2 May 2020
(1.13 MiB) Downloaded 164 times
-- according to this boiled-down version of the get-all-patterns code:

Code: Select all

# get-all-patterns.py, version 1.618033
# Version 1: initial working version
# Version 1.618033: hacked version specific to Oscillator Discussion thread

import golly as g
import urllib2
import re
import os

url = "https://conwaylife.com/forums/viewtopic.php?f=2&t=1437&start="

outfolder = "c:/REPLACE/THIS/WITH/YOUR/PATH/"

maxpost = 1268
ptr =  0

count = 0
addthis = 100000

while ptr<maxpost:  # current most recent post
  g.show("Retrieving data for " + str(ptr+1) + " through " + str(ptr+25))
  try:
    resp = urllib2.urlopen(url + str(ptr))
    html = resp.read()
  except:
    g.note("Got an error on ptr=" + str(ptr))
    continue
  lastindex = 0
  notdone=1
  while notdone:
    g.show("Retrieved html for post page starting with " + str(ptr) + ". Last index = " + str(lastindex))
    index = html.find("<code>",lastindex) + 6
    if index==5: # should really have added 6 separately after checking for -1
      notdone=0
      break
    if html[index] in ['#', 'x', 'b', 'o', '$', '[', '.', 'O', '*']: # skip any code blocks that don't look like RLE or macrocell or ASCII files
      if html[index]=='#':
        if html[index+1] not in ['C','D','N','O']:
          # don't export this one, it's probably a Python script
          lastindex = index
          continue
      index2 = html.find("</code>", index)
    
      if html[index]=="[": # macrocell
        ind1 = html.find("#R",index)
        offset = 2
        ext = ".mc"
      else:
        ind1 = html.find("rule =")
        offset=6
        if ind1==-1:
          ind1 = html.find("rule=")
          offset=5
        ext = ".rle"
        if html[index] in ['.', 'O', '*']:
          ext = ".txt"
      if ind1>-1:
        ind2 = html.find("\n",ind1)
      
      patstr = html[index:index2]
      if patstr[-1]!="\n": patstr+="\n"
      fname = os.path.join(outfolder,"pat" + str(count+addthis)[1:]) + ext
      with open(fname,'w') as f:
        if ext==".mc":
          f.write("[M2] (golly VERSION)\n#C " + url + str(ptr) +"\n"+patstr)
        else:
          f.write("#C " + url + str(ptr) + "\n" + patstr)

        count += 1
    lastindex = index
  ptr+=25

g.note("Total patterns written: " + str(count))
The full get-all-patterns script uses a somewhat different method of retrieving all the patterns from every thread. Basically it starts with the first post and collects all the posts on the page that comes up, then repeats that for every post number up to the newest post.

Doing something like that would allow the comments in the pattern files to point to the specific post containing the pattern. The fastest way to produce this ZIP file was to get rid of the code that hunts down the specific post number, so the URLs in each pattern file just point to the page of 25 posts that contains the pattern. It wouldn't be too hard to add that post-hunting code back in, though, if that would be useful.

MikeP
Posts: 105
Joined: February 7th, 2010, 9:51 am
Location: Ely, Cambridgeshire, UK

Re: "Posts per page" URL parameter?

Post by MikeP » May 3rd, 2020, 7:25 am

Fantastic - that's exactly what I wanted to do - thanks very much!

Post Reply