[gopher] Veronica-2 again, and one last robots.txt argument
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
The new crawler just took its first step tonight by taking a harnessed walk
around gopher.floodgap.com. It reads and understands robots.txt files (the
User-agent is veronica, or *), correctly traverses trees, and generates sane
indexes. Loop protection and auto-pruning will get tested later.
Revisiting robots.txt for a bit, the current logic has the following
consequences.
* If you Disallow: / in your robots.txt file, not only will your site not be
indexed, but its very existence not even registered in the statistics table
(and consequently will not appear on the master list of servers).
* Disallow: intentionally says nothing about the itemtype, both because this
is selector-oriented, and at least one person here (John) wanted as much
overlap between the Web and gopher robots.txt files so that one filesystem
can be presented both ways, and the robots.txt understood by both V-2 and
any web robots.
The consequence is this. Any gopher server that requires an "internal"
itemtype to be transmitted back to it (URLs like x.yz.com:70/11/something
where the actual selector is 1/something) MUST include this in the
Disallow: block (e.g., for this example, Disallow: 1/something).
* Disallow: /path/ works for both /path and /path/ (not substrings of same).
If this will cause trouble for people, advise ASAP. I'm planning to unleash
the crawler sometime in the next week or two.
--
---------------------------------- personal: http://www.armory.com/~spectre/ --
Cameron Kaiser, Floodgap Systems Ltd * So. Calif., USA * ckaiser@xxxxxxxxxxxx
-- If you want divine justice, die. -- Nick Seldon ----------------------------
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gopher] Veronica-2 again, and one last robots.txt argument,
Cameron Kaiser <=
|
|