Complete.Org: Mailing Lists: Archives: gopher: October 2006:
[gopher] The archive
Home

[gopher] The archive

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: gopher@xxxxxxxxxxxx
Subject: [gopher] The archive
From: John Goerzen <jgoerzen@xxxxxxxxxxxx>
Date: Fri, 6 Oct 2006 00:55:33 -0500
Reply-to: gopher@xxxxxxxxxxxx

First off, thanks to all those that have expressed interest in this.
I have your emails and will get back to you.  I've been rather busy
lately due to the birth of our first baby [1] and our upcoming move in
about a week.  So it will likely be some time before I actually get
anything sent off.

I realized also that quux.org had never been included in the run,
since it was large and I could populate it from local backups, which I
have now done.

I'd also like to document the directory structure.  It is, roughly:

gopher-arch/gopher/hostname/portnumber/selector

Wheere the selector is a Gopher menu, you will see it exist as a
directory with a file named .gophermap within it.  This file contains
the raw Gopher menu file that was sent over by the server.  This
should be easily usable by PyGopherd and Bucktooth with only minor
modifications.

I have run a duplicate file detector across the entire archive.  Any
duplicate files in it are hardlinked together.  This saved about 10G
of space.  If you're on Windows, expect this to consume 10G more when
unpacked than if you're on a Unix.

I also have a dump of the PostgreSQL database behind the robot (10M
compressed, 200M uncompressed, 1.2G when loaded into PostgreSQL).  I
will toss that on the DVD as well for anyone that's interested.

The DVDs will be generated with:

tar -cvf - gopher-arch/ | bzip2 -9 | split -d -b 4200m - gopher-arch.tar.bz2.

That is, each DVD will contain a slice of the tar'd+bzipped
directory.  If you are going to get a set of DVDs, you can read them
in, and simply:

cat gopher-arch.tar.bz2.* | bzcat | tar -xvf -

Some gopher servers do not use the slash as a path separator in the
selector.  Those servers will have a huge number of files/directories
in their top-level -- could be thousands.  You will need an efficient
modern filesystem to extract all of them in their entirety, but there
aren't many.

I will get back to everyone once I have the time to send out the DVDs.

[1] http://changelog.complete.org/posts/545-The-News.html




[Prev in Thread] Current Thread [Next in Thread]