Complete.Org: Mailing Lists: Archives: gopher: May 2004:
[gopher] Re: Cicada Incomplete Gopher Census
Home

[gopher] Re: Cicada Incomplete Gopher Census

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] Re: Cicada Incomplete Gopher Census
From: Cameron Kaiser <spectre@xxxxxxxxxxxx>
Date: Mon, 31 May 2004 01:26:46 -0700 (PDT)
Reply-to: gopher@xxxxxxxxxxxx

> > After the V-2 cleanup this weekend, it has pared itself down to
> > 255 unique hosts and a database of about 1.8 million selectors.
> 
> OK, I found only 154, so I clearly have a bug.  My selector counts
> seem very low, too.  I'm not sure it's worth debugging given that the
> floodgap index is updating again, but just in case I get bored: my
> spider is supposed to follow only selectors with type 1 or 11.  Are
> there other directory types that I should follow?

Besides the fact that '11' per se isn't an itemtype (it's just a '1'), no,
that's all this robot follows. I have the advantage of having had an old
partially filled hosts database to iterate through, so my host list fans
out faster than if it were left to discover hosts entirely unaided.

> How does floodgap's Veronica-2 spider limit the load it places on
> sites?  Does it check for a robots.txt file, or some similar mechanism?

Yes (see

        gopher://gopher.floodgap.com/0/v2/help/indexer

), and it also has a methodology for rotating through a list of hosts it's
working through, trying not to bang on any one host much more than a couple
times per minute at the very most.

I'm almost complete with tuning changes and the indexer probably will be
released again sometime tomorrow afternoon.

-- 
---------------------------------- personal: http://www.armory.com/~spectre/ --
 Cameron Kaiser, Floodgap Systems Ltd * So. Calif., USA * ckaiser@xxxxxxxxxxxx
-- "Another day, another dangling modifier" -----------------------------------


[Prev in Thread] Current Thread [Next in Thread]