Complete.Org: Mailing Lists: Archives: gopher: June 2003:
[gopher] Veronica-2 again, and one last robots.txt argument
Home

[gopher] Veronica-2 again, and one last robots.txt argument

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] Veronica-2 again, and one last robots.txt argument
From: Cameron Kaiser <spectre@xxxxxxxxxxxx>
Date: Tue, 24 Jun 2003 22:39:42 -0700 (PDT)
Reply-to: gopher@xxxxxxxxxxxx

The new crawler just took its first step tonight by taking a harnessed walk
around gopher.floodgap.com. It reads and understands robots.txt files (the
User-agent is veronica, or *), correctly traverses trees, and generates sane
indexes. Loop protection and auto-pruning will get tested later.

Revisiting robots.txt for a bit, the current logic has the following
consequences.

* If you Disallow: / in your robots.txt file, not only will your site not be
  indexed, but its very existence not even registered in the statistics table
  (and consequently will not appear on the master list of servers).

* Disallow: intentionally says nothing about the itemtype, both because this
  is selector-oriented, and at least one person here (John) wanted as much
  overlap between the Web and gopher robots.txt files so that one filesystem
  can be presented both ways, and the robots.txt understood by both V-2 and
  any web robots.

  The consequence is this. Any gopher server that requires an "internal"
  itemtype to be transmitted back to it (URLs like x.yz.com:70/11/something
  where the actual selector is 1/something) MUST include this in the
  Disallow: block (e.g., for this example, Disallow: 1/something).

* Disallow: /path/ works for both /path and /path/ (not substrings of same).

If this will cause trouble for people, advise ASAP. I'm planning to unleash
the crawler sometime in the next week or two.

-- 
---------------------------------- personal: http://www.armory.com/~spectre/ --
 Cameron Kaiser, Floodgap Systems Ltd * So. Calif., USA * ckaiser@xxxxxxxxxxxx
-- If you want divine justice, die. -- Nick Seldon ----------------------------


[Prev in Thread] Current Thread [Next in Thread]
  • [gopher] Veronica-2 again, and one last robots.txt argument, Cameron Kaiser <=