Complete.Org: Mailing Lists: Archives: offlineimap: May 2008:
Re: offlineimap optimisations
Home

Re: offlineimap optimisations

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: offlineimap@xxxxxxxxxxxx
Cc: martin f krafft <madduck@xxxxxxxxxxx>
Subject: Re: offlineimap optimisations
From: John Goerzen <jgoerzen@xxxxxxxxxxxx>
Date: Tue, 13 May 2008 08:00:04 -0500

On Tue May 13 2008 3:45:57 am martin f krafft wrote:
> Hi,
> trying to sync a 150k message mailbox (which isn't my first time),
> I ran into some performance issues which made me look at offlineimap
> strace output.
>
> There, I noticed two things:
>
> 1. offlineimap *churns* through PIDs. I know the programme uses
>    threads, but in my understanding, threads don't eat PIDs, only
>    forks do. So what is it doing, consuming somewhere in the
>    vicinity of 100 PIDs per minute?

That may turn out to be a Python issue.  OfflineIMAP uses threads, but does 
not fork.

Well, there are two exceptions to the does not fork:

1) If you are using the Curses interface, it could fork at the very beginning

2) If you are using preauthtunnel to connect, it would fork for each 
connection

I do not know the internal details of Python's threading.  It is entirely 
possible that the threading is causing this, and it may not necessarily be 
the problem you may expect it to be.  You may be looking at new threads, 
created by clone() or whatnot, instead of full new processes.

> 2. I see a lot of
>      [pid 19379]
> rename("/home/madduck/.var/offlineimap/Account-seamus.madduck.net/LocalSta
>tus/spool.tmp",
> "/home/madduck/.var/offlineimap/Account-seamus.madduck.net/LocalStatus/spo
>ol") = 0 in the output, and it makes me think that offlineimap rewrites the
> database after every message it receives. I know sqlite
>    integration is in the works, but maybe a journaling approach
>    would be better, writing changes to a journal and merging it in
>    on completion, or whenever there's time.

In offlineimap/folder/LocalStatus.py, there is a line that says:

        self.doautosave = 1

there is also a function called autosave() that is called after every 
modification to the local status cache.  It simply checks if doautosave is 
true, and if so, saves off the local status cache.

Setting self.doautosave = 0 is probably not a simple fix yet, because I don't 
think save() will ever get called in that case.

But the infrastructure is there in autosave() to do something more 
intelligent, perhaps.

I think that Sqlite is a reasonable way forward on this, and Stewart Smith is 
working on it.  See the thread at

http://lists.complete.org/offlineimap@xxxxxxxxxxxx/2008/03/msg00034.html.gz

There were a couple of regressions in the patch, as well as a concern about 
converting from the current format.

-- John



[Prev in Thread] Current Thread [Next in Thread]