Complete.Org: Mailing Lists: Archives: gopher: November 2005:
[gopher] Re: Bot update
Home

[gopher] Re: Bot update

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] Re: Bot update
From: Chris <chris@xxxxxxxxxx>
Date: Tue, 29 Nov 2005 21:03:35 -0600
Reply-to: gopher@xxxxxxxxxxxx

Hi John and all,
  I had thought putting it on cd for starters was a good idea and I would like 
to reserve one myself if you do.
I will host it here on the gopher in its entirety, but, as we spoke briefly 
before its a big chunk to search through...
Perhaps Floodgaps veronica can handle such a large database I don't know. I 
cannot handle that much information on a single jughead and am looking at a 
better method to have users search through the entire database. Some other 
possibilities came to mind, things such as breaking it up into datasets and 
having various boxen here as well as at other gophers each maintain a dataset 
or sets. These could be broken up in various ways , locale, alpha/numerical, 
domain or just from start to finish in chunks of a certain size. I don't know 
if thats the best method or not, just bouncing some ideas around. Another point 
from a concern of various spidering methods and data retrievals. If anyone is 
placing such a large database on a gopher could we perhaps all make sure to put 
it in the same /bigdatabase dir? This way we can skip it on jughead multi site 
searches or whatever as well as putting it in robots.txt to try and minimize 
accidentally grabbing it on a crawl?
These were just some thoughts I had. Thanks John for getting it I think it's 
awesome and am excited to see what we can all do with it.
Chris
gopher://hal3000.cx


On Tue, 29 Nov 2005 17:20:06 -0600
John Goerzen <jgoerzen@xxxxxxxxxxxx> wrote:

> On Wed, Nov 16, 2005 at 10:04:17PM -0600, Jeff wrote:
> > On Sun, 30 Oct 2005 21:48:51 -0600, John Goerzen <jgoerzen@xxxxxxxxxxxx>  
> > wrote:
> > 
> > > Here's an update on the gopher bot:
> > >
> > > There is currently 28G of data archived representing 386,315
> > > documents.  1.3 million documents remain to be visited, from
> > > approximately 20 very large Gopher servers.  I believe, then, that the
> > > majority of gopher servers have been cached by this point.  3,987
> > > different servers are presently represented in the archive.
> > 
> > Any news?
> 
> Not really.  The bot hit a point where its algorithm for storing page
> information was getting to be too slow, and there was also a problem
> with the database layer I'm using segfaulting.  When I get some time, I
> will write a new layer.
> 
> In the meantime, I'd like to talk about how to get this data to others
> that might be willing to host it, as well as how to store it out there
> for the public.  Any ideas?
> 
> 
> 
> 


-- 
Join FSF as an Associate Member at:
<URL:http://member.fsf.org/join?referrer=3014>



[Prev in Thread] Current Thread [Next in Thread]