[gopher] Re: New Gopher Wayback Machine Bot
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
On Wed, Oct 12, 2005 at 04:45:56PM -0700, Cameron Kaiser wrote:
> > Cameron, floodgap.com seems to have some sort of rate limiting and keeps
> > giving me a Connection refused error after a certain number of documents
> > have been spidered.
>
> I'm a little concerned about your project since I do host a number of large
> subparts which are actually proxied services, and I think even a gentle bot
> going methodically through them would not be pleasant for the other side
> (especially if you mean to regularly update your snapshot).
Valid concern. I had actually already marked your site off-limits
because I noticed that. Incidentally, your robots.txt doesn't seem to
disallow anything -- might want to take a look at that ;-)
[snip]
> I do support robots.txt, see
>
> gopher.floodgap.com/0/v2/help/indexer
Do you happen to have the source code for that available? I've got
some questions for you that it could explain (or you could), such as:
1. Which would you use? (Do you expect URLs to be HTTP-escaped?)
Disallow: /Applications and Games
Disallow: /Applications%20and%20Games
2. Do you assume that all Disallow patterns begin with a slash as they
do in HTML, even if the Gopher selector doesn't?
3. Do you have any special code to handle the UMN case where
1/foo, /foo, and foo all refer to the same document?
I will be adding robots.txt support to my bot and restarting it shortly.
Thanks,
-- John
--
John Goerzen
Author, Foundations of Python Network Programming
http://www.amazon.com/exec/obidos/tg/detail/-/1590593715
|
|