Complete.Org: Mailing Lists: Archives: offlineimap: March 2007:
Re: [PATCH] Convert LocalStatus to sqlite
Home

Re: [PATCH] Convert LocalStatus to sqlite

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Stewart Smith <stewart@xxxxxxxxxxxxxxxx>
Cc: offlineimap@xxxxxxxxxxxx
Subject: Re: [PATCH] Convert LocalStatus to sqlite
From: John Goerzen <jgoerzen@xxxxxxxxxxxx>
Date: Wed, 28 Mar 2007 10:29:46 -0500

On Wed, Mar 28, 2007 at 10:03:38PM +1000, Stewart Smith wrote:
> Forgive the potentially bad python, not my native tongue :)

Hi Stewart,

Thanks for the report, idea, and especially for backing it up with code.

Before applying, however, I want to make sure that I have a good
understanding of the underlying problems and of your fix.

> This patch is motivated by three things:
> - offlineimap is extremely slow at syncing lots of locally deleted
> messages
> - offlineimap uses lots of memory

Regarding these two points, I have never seen these problems except when
using one of the GUIs (which I will be removing from OfflineIMAP
anyhow).  Can you provide some details on your mailbox sizes and setup?

> - LocalStatus files aren't written safely (a hard crash can cause
> corruption)

Please explain how this is so.

I went back and double-checked the code and it looks completely safe to
me.  The new data is written only after a folder syncs properly.  It is
written to a temporary file.  That file is then closed, and only if all
of that worked fine, it is renamed on top of the old file.

If there is a problem at any point, you should have either the old or
the new data intact.  Believe me, I've had plenty of OfflineIMAP
exceptions in my time and have not had trouble with this ;-)

The only thing I can spot is that maybe we ought to fsync() the
individual message files when wrotten before closing the status file.

That would be a trivial patch.  I'll write it up shortly.

>       - I've been bitten by this in the past, causing a complete resync of
> the folder... so I get duplicate messages.

And this could happen with your patch as well, if a message is synced
and stored to disk before the corresponding status is committed.  I do
not believe your patch would fix this.

> In my tests, execution time for a normal sync is relatively the same.
> 
> Execution time for when lots of messages have been deleted in a
> reasonably sized folder (e.g. during re-organisation of mail folders) is
> as much as 10x faster.

Can you post the output from time(1) on the same operation with the old
approach and the new?  Also, how many messages were in the box before
the delete, and how many after?

> I had disable the threading for copying messages as this means that
> LocalStatus objects are shared between threads, which pysqlite doesn't
> like (it asserts).

I noticed that.  I will not be applying this patch until that is fixed
for sure.  Losing threading would be a significant regression for a lot
of people for the benefit of one particular isolated case.

> A future patch may convert other storage types to sqlite (or similar) to
> further reduce memory consumption (and hopefully runtime).

A couple of other nits:

* Your patch doesn't apply cleanly to the source tree as it exists in
  Darcs.

* There appear to be some indentation problems (which may be related
  to the patch not applying cleanly)  This will likely lead to trouble.
  These exist in your new version too, but also here...


>  class LocalStatusFolder(BaseFolder):
> +    def __deinit__(self):
> +     self.save()
> +        self.cursor.close()
> +     self.connection.close()
> +

That's one example.

> -            os.unlink(self.filename)
> +         self.cursor.close()
> +         self.connection.close()
> +            os.unlink(self.dbfilename)

Here's another.

Once again, thanks much for the submission!

-- John



[Prev in Thread] Current Thread [Next in Thread]