Complete.Org: Mailing Lists: Archives: offlineimap: March 2007:
Re: A bit unrelated to: [PATCH] Convert LocalStatus to sqlite
Home

Re: A bit unrelated to: [PATCH] Convert LocalStatus to sqlite

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: David Favro <offlineimap@xxxxxxxxxxxxxxxx>
Cc: offlineimap@xxxxxxxxxxxx
Subject: Re: A bit unrelated to: [PATCH] Convert LocalStatus to sqlite
From: Stewart Smith <stewart@xxxxxxxxxxxxxxxx>
Date: Thu, 29 Mar 2007 11:03:26 +1000

On Thu, 2007-03-29 at 02:26 +0700, David Favro wrote:
> Stewart Smith wrote:
> > Server is courier-imap serving out of Maildir. [...]  Disk is RAID1
> > running XFS.
> > [...] about 153 mailboxes. Largest is about 170,000 messages... at least
> > another a bit over 100,000. 5-10k messages fairly common. 30k messages
> > in a few folders.
> >
> > The typical test was deleting  about 20,000 messages from a 120,000
> > message folder.
> >   
> 
> Not regarding the patch, nor offlineimap performance, but your numbers
> make me think about something that's bothered me about Maildir for a
> long time... I don't know much (anything) about XFS, but in general it
> strikes me as unlikely that most filesystems can provide decent
> performance when accessing, or opening a file from, a single directory
> that contains, e.g., 120k or 170k files -- certainly older OS's will
> perform poorly.  It seems like a glaring defect in the whole Maildir
> format (as well as modern data storage in general, which I have thought
> for some time is ill-suited to the hierarchical directory structure of
> most filesystems).  I have wanted to write an SQL-based mailbox for a
> long time but haven't gotten around to it, and recently noticed that
> these guys have done it already, though I haven't tried it yet:
> http://dbmail.org/
I'm not running older OS's :) It is a problem with the Maildir format
though, and there isn't a good way around it. Even if you partitioned
things up into multiple cur directories, you're still going to be
accessing a lot of inodes to scan through the folder (or back up the
disk).

Probably a better format would be Maildir with support for multiple mbox
files for archived messages.... along with some sort of index for
them.... or in a database, yeah.

The big thing with new storage formats (e.g. database) is having all the
clients talk to it.

> Anyhow, just a comment, but I was also wondering what kind of
> performance you get, Stewart, on those mailboxes.  Did you ever consider
> a different storage format?  Does anyone know if linear search is still
> typical of filesystem directory traversal or are people using hashes or
> btrees these days?  That would also seem odd because it's overkill for
> the more typical FS directory that contains <100 files.

I get decent enough performance - evolution also keeps some indexes to
help speed things up.

XFS will change the format of the directory on disk as needed. Small
directories are stored in the directory inode, larger ones a linear list
and even larger ones are a B+Tree.

ext3 also has a directory indexing option - and while not as efficient
as XFS (typically) it gives much better performance for large
directories.

With only 100-200k files, it's not the file system that's the limiting
factor... it's certainly offlineimap.
-- 
Stewart Smith (stewart@xxxxxxxxxxxxxxxx)
http://www.flamingspork.com/


-- Attached file included as plaintext by Ecartis --
-- File: signature.asc
-- Desc: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQBGCxBeKglWCUL+FDoRApzMAJ4mOTtJpLiXgWKV6tJSaxdCL2tqswCglVka
lmHOdDXBVQOU+GbbcIQX1/A=
=NAfx
-----END PGP SIGNATURE-----




[Prev in Thread] Current Thread [Next in Thread]