[gopher] Re: Bot update
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
Hi John and all,
I had thought putting it on cd for starters was a good idea and I would like
to reserve one myself if you do.
I will host it here on the gopher in its entirety, but, as we spoke briefly
before its a big chunk to search through...
Perhaps Floodgaps veronica can handle such a large database I don't know. I
cannot handle that much information on a single jughead and am looking at a
better method to have users search through the entire database. Some other
possibilities came to mind, things such as breaking it up into datasets and
having various boxen here as well as at other gophers each maintain a dataset
or sets. These could be broken up in various ways , locale, alpha/numerical,
domain or just from start to finish in chunks of a certain size. I don't know
if thats the best method or not, just bouncing some ideas around. Another point
from a concern of various spidering methods and data retrievals. If anyone is
placing such a large database on a gopher could we perhaps all make sure to put
it in the same /bigdatabase dir? This way we can skip it on jughead multi site
searches or whatever as well as putting it in robots.txt to try and minimize
accidentally grabbing it on a crawl?
These were just some thoughts I had. Thanks John for getting it I think it's
awesome and am excited to see what we can all do with it.
Chris
gopher://hal3000.cx
On Tue, 29 Nov 2005 17:20:06 -0600
John Goerzen <jgoerzen@xxxxxxxxxxxx> wrote:
> On Wed, Nov 16, 2005 at 10:04:17PM -0600, Jeff wrote:
> > On Sun, 30 Oct 2005 21:48:51 -0600, John Goerzen <jgoerzen@xxxxxxxxxxxx>
> > wrote:
> >
> > > Here's an update on the gopher bot:
> > >
> > > There is currently 28G of data archived representing 386,315
> > > documents. 1.3 million documents remain to be visited, from
> > > approximately 20 very large Gopher servers. I believe, then, that the
> > > majority of gopher servers have been cached by this point. 3,987
> > > different servers are presently represented in the archive.
> >
> > Any news?
>
> Not really. The bot hit a point where its algorithm for storing page
> information was getting to be too slow, and there was also a problem
> with the database layer I'm using segfaulting. When I get some time, I
> will write a new layer.
>
> In the meantime, I'd like to talk about how to get this data to others
> that might be willing to host it, as well as how to store it out there
> for the public. Any ideas?
>
>
>
>
--
Join FSF as an Associate Member at:
<URL:http://member.fsf.org/join?referrer=3014>
|
|