Complete.Org: Mailing Lists: Archives: offlineimap: June 2004:
Re: Status update
Home

Re: Status update

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: offlineimap@xxxxxxxxxxxx
Subject: Re: Status update
From: John Goerzen <jgoerzen@xxxxxxxxxxxx>
Date: Fri, 4 Jun 2004 22:11:07 -0500

On Fri, Jun 04, 2004 at 07:12:59PM -0400, Aron Griffis wrote:
> John Goerzen wrote:   [Fri Jun 04 2004, 12:11:35PM EDT]
> > First, I never found the time to finish the switchover to Twisted.
> > Therefore, unless somebody else takes it up, the Twisted (4.99.x/5.x)
> > branch of OfflineIMAP is dead.
> 
> Could you give a quick summary of what this means?  I haven't done any
> OfflineIMAP development (yet) but I'd like to know what a switchover
> to Twisted would buy us.

Sure.  Bear in mind as you notice the length of this message that you
asked for it :-)

Since version 2.0.0, OfflineIMAP has been capable of communicating to
the IMAP server over multiple channels simultaneously.  This capability
is solely a performance-related one.  To be a correct and safe program,
OfflineIMAP must send commands to the IMAP server and receive responses
back.  On high-latency links, or even just a regular Internet pipe, this
causes a noticable slowdown.  While OfflineIMAP is waiting for a
response, the link sits unused.

To improve OfflineIMAP's performance, it can open up several connections
to the IMAP server.  It can be downloading two messages while it's
scanning a third folder, for instance.  This often results in dramatic
speedups.  My own benchmarks showed a reduction from 20 to 9 seconds
between OfflineIMAP 1.x to 2.x.  (This time was farther reduced to 2
seconds by OfflineIMAP 3.x.)

To be able to handle these multiple connections at once, there are two
main options available: multitasking and asynchronous I/O.  OfflineIMAP
to date has chosen multitasking in the form of Python threads.
Threading essentially allows execution to proceed in parallel; each
"thread of execution" pursues its own course.  For instance, there may
be a thread to sync a folder and a thread to download some messages.

Multithreaded programs must be carefully designed to maintain data
integrity.  For instance, if two threads simultaneously opened and
attempted to write data to an OfflineIMAP status file, gibberish could
result.  Various techniques such as locks and semaphores exist and are
used in OfflineIMAP to prevent such problems from occuring.  While these
mechanisms are well-understood, and OfflineIMAP's code has remained
almost entirely untouched since the release of v4.0 (meaning it is
likely quite stable), there is a measure of complexity to any
multitasking program.

Twisted is a Python infrastructure around asynchronous I/O.  An
asynchronous program uses a single thread/process.  Whereas each thread
in a multithread app tends to use conventional I/O, the single process
in an asynchronous program uses the select() or poll() calls at the
heart of its code.  It will wait for something to happen over its
connections -- they may be ready to accept more data to write or to
read, for instance.  It will then send or receive the data, process it,
and resume waiting.  At the heart of an asynchronous program is the
assumption that the network is far slower than the processing the
program must do.  Fortunately, this is almost always the case and
OfflineIMAP is no exception.

An asynchronous program need not worry about synchronization, locks, or
semaphores because it is a single process.  However, those problems are
traded for two others: keeping state and never blocking.

Never blocking means that you can never do a simple read() or write() to
or from the network.  You may only do those when select() or poll() says
you can.  You have to keep track of your state so you know what to do
when data is received or can be sent.  This makes up most of the
complexity of asynchronous programs.

Twisted is a Python infrastructure to help you keep the state.  You tell
Twisted what you want to have happen when data arrives, and it calls
your function with the data.  It is an ingenious infrastructure that
simplifies the structure of an asynchronous program.

However, once you step down the asynchronous path, everything must be
asynchronous, including the UI.  So all OfflineIMAP's interfaces would
have to be ripped out and rewritten for use with Twisted.  No Twisted
interface for curses exists, so it would have to be written.

A large part of the reason for considering Twisted was the bugginess of
threading in Python on certain platforms.  Python 2.2 and early versions
of 2.3 were incapable of running OfflineIMAP correctly on certain
platforms.

Recent testing indicates that these problems are mostly, if not
completely, gone.  The original reason to move to Twisted thus no longer
exists, and frankly, I've lost interest in the project because, well...
OfflineIMAP works.

Frankly, I consider OfflineIMAP to be mostly done.  I have accomplished
everything I set out to with it.  There are a few more features I'd like
(and now that I'm running it on my Zaurus PDA, I want a message size
filter <g>), but nothing big.  So to me the effort to continue the
Twisted rewrite just wouldn't pay off.

However, I'll make the code available to anyone that would like a copy.
(It's in my arch repository already.)

For 4.x, patches will be accepted.  I would like to move OfflineIMAP
into an arch repository and try to move it towards a more bazaar style
of development.  That way, people that have ideas of new places to take
it can develop them, run it by the community, and get them integrated.

BTW I notice you are a Gentoo developer.  (Great!  I'm a Debian
developer... you may be interested in my Debian From Scratch rescue
environment at http://people.debian.org/~jgoerzen/dfs/.  It's a rescue
disk and installer modeled after Gentoo's.)  If there's anything I can
do to help you out, please let me know.  I expect your port should be
trivial since OfflineIMAP has no dependencies save Python (though it can
use Python-Tk if installed).  But if there are ever patches, please try
to make them generic and send my way.

I tried out Gentoo briefly about a month ago and noticed that
OfflineIMAP was masked on certain architectures.  This is probably an
incorrect action.  OfflineIMAP is written in pure Python.  If there is a
problem on one arch or another, the problem would almost certainly be in
Python instead of OfflineIMAP (so Python should possibly be masked on
those archs).  I believe I reported a bug about this but I'm not sure
what happened to it :-)  I personally run it regularly on i386, amd64,
alpha, and arm (all Debian) and have run it on NetBSD, and can vouch for
its cross-platformness :-)  (The curses interface sucks on NetBSD, but
as far as I can tell, that's because curses sucks on NetBSD  <g>)

-- John


[Prev in Thread] Current Thread [Next in Thread]