confocaloid wrote: ↑January 11th, 2024, 8:57 pm
dvgrn wrote: ↑January 11th, 2024, 4:47 pm
Here's hoping somebody eventually decides to host a really big, and extensible, octohash-type search utility somewhere online!
I myself think a "generate, test, filter" search program would work better, when there are multiple different requests and filters that could be combined. It would be slower (since it would recompute constellations and evolve collisions each time), but it would need much less space, and it would work offline. Precomputing and storing all plausibly-interesting searches is going to become unfeasible.
Certainly that's true in some cases. On the other hand, we already have spark_search and QuFince, not to mention gencols.
The current octohash databases add up to about four and a half million collisions, so if I'm doing the math right (I might not be) ... that's just about one gigabyte of raw data -- 256*9*4500000. It takes something like 30 seconds with the current search algorithm on a good computer, to find all the matches in one gigabyte of data.
YouTube is currently storing, in the cloud, somewhere around 2,500,000,000 gigabytes of data, any piece of which can be retrieved fairly quickly by going to the right URL.
An online octohash database could search a hundred times as many collisions as we have fingerprinted so far, at least a thousand times faster than the octohash scripts currently do it -- at the cost of making the data stored on the server between 100 and 1000 times bigger. That just requires adding indexes. I haven't really scoped out how good some kind of SQL database would be at this kind of trick, but this is exactly the kind of thing that SQL databases are designed and optimized for.
A hundred times the current octohash search space is a
lot better than we have now -- and it's totally doable with existing technology, even for a server hosted on somebody's laptop and made available only when it's convenient. Inevitably there definitely is always going to be some level of search that's unfeasible to do this way -- but at the same time, there's a much larger "octohash-reachable" search space that is actually very easy to set up.
Trying to work through all of that much larger octohash-reachable search space with a "generate, test, filter" approach would not just be a little bit slow; it would be unworkably slow -- like, weeks or months for each individual search.
The great advantage of the octohash approach is that you really only have to do those weeks or months of "generate, test, and filter"
once -- and then record all of the generated data in octohash form, somewhere that has a lot of storage space. Storage is cheap; people's time is expensive, when a single search is eating up multiple days' worth of it.