Complete.Org: Mailing Lists: Archives: gopher: October 2005:
[gopher] Re: New Gopher Wayback Machine Bot
Home

[gopher] Re: New Gopher Wayback Machine Bot

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] Re: New Gopher Wayback Machine Bot
From: John Goerzen <jgoerzen@xxxxxxxxxxxx>
Date: Wed, 12 Oct 2005 21:52:33 -0500
Reply-to: gopher@xxxxxxxxxxxx

On Wed, Oct 12, 2005 at 04:45:56PM -0700, Cameron Kaiser wrote:
> > Cameron, floodgap.com seems to have some sort of rate limiting and keeps
> > giving me a Connection refused error after a certain number of documents
> > have been spidered.
> 
> I'm a little concerned about your project since I do host a number of large
> subparts which are actually proxied services, and I think even a gentle bot
> going methodically through them would not be pleasant for the other side
> (especially if you mean to regularly update your snapshot).

Valid concern.  I had actually already marked your site off-limits
because I noticed that.  Incidentally, your robots.txt doesn't seem to
disallow anything -- might want to take a look at that ;-)

[snip]

> I do support robots.txt, see
> 
>       gopher.floodgap.com/0/v2/help/indexer

Do you happen to have the source code for that available?  I've got
some questions for you that it could explain (or you could), such as:

 1. Which would you use?  (Do you expect URLs to be HTTP-escaped?)

    Disallow: /Applications and Games
    Disallow: /Applications%20and%20Games

2. Do you assume that all Disallow patterns begin with a slash as they
   do in HTML, even if the Gopher selector doesn't?

3. Do you have any special code to handle the UMN case where
   1/foo, /foo, and foo all refer to the same document?

I will be adding robots.txt support to my bot and restarting it shortly.

Thanks,

-- John



-- 
John Goerzen
Author, Foundations of Python Network Programming
http://www.amazon.com/exec/obidos/tg/detail/-/1590593715



[Prev in Thread] Current Thread [Next in Thread]