[gopher] Re: Whats all this talk about?
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
> The box Veronica is on is a p200mmx
> Jughead is on another p200mmx
> Freebsd for both.
> The list of sites is included with the
> About_Veronica_Search text and
> About_Multi_Search talks of Jughead
Ah, thanks (*reads it*).
> I am having problems with the .tree script in
> that there is not any decent fall backs for things
> like high latency or lost connection, there is
> an "Alarm" sent in text and that ends the "tree-ing"
> for that site. This may be why the results are so far
> differing at times with yours Cameron.
Is the set up actually a crawler? It's not clear to me if you're using a
predigested index the outside sites provide, or if you're crawling it
yourself. I'm assuming based on
> I have shown which sites "Alarmed" and therefore
> are incomplete.
> For instance:
> gopher.semo.edu #alarm long way in
> that is to say after a long time and quite far
> in the tree I recieved an alarm which indcates
> one of several things, timeout, loss of connection,
> exceded "depth" etc.
that you are crawling it yourself.
> Cameron I think you are indexing more than I atm as
> well, with my raw data being about 20M and the data
> file being 10M with a 1M offset file and a 5M "other"
> file .
How many selectors does that translate to? For the record,
gopher% ls -sk # in kilobytes
146408 history.MYD 3664 prospects.MYI 12 stats.frm
105496 history.MYI 12 prospects.frm 304968 textil.MYD
12 history.frm 6 stats.MYD 391104 textil.MYI
4696 prospects.MYD 9 stats.MYI 12 textil.frm
so not quite a gig so far. Note it is not full-text.
textil is the keyword and relevancy table, history is the selector/display
string database, prospects is the workspace table and stats is cached
precomputed statistics used for /world. This is with 1.1 million selectors,
give or take a couple thousand, using my regular "stupid" crawler library.
Mind you, this is not a competition :) I'm just curious about how you're
getting things up and running. So far you seem to be getting pretty good
results for an early effort, so you are to be congratulated.
--------------------------------- personal: http://www.armory.com/~spectre/ ---
Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckaiser@xxxxxxxxxxxx
-- Hi! I am a .signature virus. Copy me into your .signature to join in! -----