Complete.Org: Mailing Lists: Archives: offlineimap: July 2007:
[PATCH] LocalStatus in sqlite (take2)
Home

[PATCH] LocalStatus in sqlite (take2)

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: offlineimap <offlineimap@xxxxxxxxxxxx>
Subject: [PATCH] LocalStatus in sqlite (take2)
From: Stewart Smith <stewart@xxxxxxxxxxxxxxxx>
Date: Fri, 06 Jul 2007 17:23:59 +1000

Hi there!
This is the 2nd take on the LocalStatus in sqlite patch. (applies to
near latest darcs as latest darcs just doesn't work).

The aim of this patch is to:
a) reduce memory footprint on large mailboxes
b) reduce CPU usage maintaining LocalStatus
c) scale better with large mailboxes

Amazingly enough it does pretty much do this.

The additional patch (updateflags) commits less often, making something
like "mark as read" 100,000 messages locally complete way faster than
committing on every flag update.

Performance:

- Tested on a single 411MB Maildir with 12,701 messages.
- Filesystems:
        - from remote XFS on raid1 (both disks on same IDE controller)
        - to local XFS
        - both are real file systems and well aged.
- Connection to IMAP server was via 802.11g directly to the imapd. The
only time the network was remotely saturated in bandwidth was when
syncing large mail messages.
- imapd was courier-imap.
- MaxConnections for the remote repository was 3.
        - NOTE: total sync time was no different between the unpatched and
patched versions, suggesting that threads for a single mailbox does
nothing to improve performance.
- preauthtunnel was ssh
- ui was TTYUI.

Disclaimer: I am no python programmer... I hack clustered database
internals... newbie python mistakes welcomed to be pointed out.

Unpatched offlineimap took:

For the initial sync (all messages):
                real    9m35.872s
                user    4m20.196s
                sys     0m39.158s

With resident memory going from 22MB to 30MB to 40MB.
With CPU usage climing from ~35% to ~70%

The second sync (doing nothing) took:
                real    0m6.846s
                user    0m2.940s
                sys     0m0.724s



WITH THE PATCH::
----------------
With this patch, offlineimap took:
                real    9m26.239s
                user    0m43.595s
                sys     0m20.113s

Real time saved: 10seconds (insignificant)
CPU time saved: 3min 40seconds user! and 19seconds of system!

For the second (doing nothing) sync:
1st run:
                real    0m11.403s
                user    0m4.408s
                sys     0m1.320s
2nd run:        
                real    0m6.893s
                user    0m4.260s
                sys     0m1.392s

So it looks like the null sync may take a bit more real time and a bit
more CPU. IMHO this is an acceptable hit.


With the update flags patch (also included), marking ~13,000 messages as
read in localstatus takes essentially no time.

I also experimented with bulk committing new mail messages to
LocalStatus (every 10 and every 100 messages). It was possible to shave
~30 and ~60 seconds off the initial sync respectively. Although I don't
think the possibility of duplicate messages is really worth it at this
stage. I think there are other (better) ways to improve initial sync
performance than this.

I have been using this patch since I first mailed it back in March - and
am very happy with it (and have had no problems).


Comments and thoughts welcome!



Index: offlineimap/offlineimap/accounts.py
===================================================================
--- offlineimap.orig/offlineimap/accounts.py    2007-07-06 16:38:22.125378241 
+1000
+++ offlineimap/offlineimap/accounts.py 2007-07-06 16:39:15.400414212 +1000
@@ -169,7 +169,7 @@ def syncfolder(accountname, remoterepos,
     ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
     ui.loadmessagelist(localrepos, localfolder)
     localfolder.cachemessagelist()
-    ui.messagelistloaded(localrepos, localfolder, 
len(localfolder.getmessagelist().keys()))
+    ui.messagelistloaded(localrepos, localfolder, 
localfolder.getmessagecount())
 
 
     # Load status folder.
@@ -187,7 +187,7 @@ def syncfolder(accountname, remoterepos,
     # validity problem, warn and abort.  If there are no messages, UW IMAPd
     # loses UIDVALIDITY.  But we don't really need it if both local folders are
     # empty.  So, in that case, just save it off.
-    if len(localfolder.getmessagelist()) or len(statusfolder.getmessagelist()):
+    if localfolder.getmessagecount() or statusfolder.getmessagecount():
         if not localfolder.isuidvalidityok():
             ui.validityproblem(localfolder)
            localrepos.restore_atime()
@@ -204,7 +204,7 @@ def syncfolder(accountname, remoterepos,
     ui.loadmessagelist(remoterepos, remotefolder)
     remotefolder.cachemessagelist()
     ui.messagelistloaded(remoterepos, remotefolder,
-                         len(remotefolder.getmessagelist().keys()))
+                         remotefolder.getmessagecount())
 
 
     #
Index: offlineimap/offlineimap/folder/Base.py
===================================================================
--- offlineimap.orig/offlineimap/folder/Base.py 2007-07-06 16:38:22.129378469 
+1000
+++ offlineimap/offlineimap/folder/Base.py      2007-07-06 16:39:15.424415580 
+1000
@@ -129,6 +129,24 @@ class BaseFolder:
         You must call cachemessagelist() before calling this function!"""
         raise NotImplementedException
 
+    def uidexists(self,uid):
+        """Returns true if uid exists"""
+       mlist = self.getmessagelist()
+       if uid in mlist:
+               return 1
+       else:
+               return 0
+       return 0
+
+    def getmessageuidlist(self):
+        """Gets a list of UIDs.
+        You may have to call cachemessagelist() before calling this 
function!"""
+       return self.getmessagelist().keys()
+
+    def getmessagecount(self):
+        """Gets the number of messages."""
+        return len(self.getmessagelist().keys())
+
     def getmessage(self, uid):
         """Returns the content of the specified message."""
         raise NotImplementedException
@@ -237,7 +255,7 @@ class BaseFolder:
         and once that succeeds, get the UID, add it to the others for real,
         add it to local for real, and delete the fake one."""
 
-        uidlist = [uid for uid in self.getmessagelist().keys() if uid < 0]
+        uidlist = [uid for uid in self.getmessageuidlist() if uid < 0]
         threads = []
 
         usethread = None
@@ -294,11 +312,10 @@ class BaseFolder:
         them to dest."""
         threads = []
         
-       dest_messagelist = dest.getmessagelist()
-        for uid in self.getmessagelist().keys():
+        for uid in self.getmessageuidlist():
             if uid < 0:                 # Ignore messages that pass 1 missed.
                 continue
-            if not uid in dest_messagelist:
+            if not dest.uidexists(uid):
                 if self.suggeststhreads():
                     self.waitforthread()
                     thread = InstanceLimitedThread(\
@@ -321,11 +338,10 @@ class BaseFolder:
         Look for message present in dest but not in self.
         If any, delete them."""
         deletelist = []
-       self_messagelist = self.getmessagelist()
-        for uid in dest.getmessagelist().keys():
+        for uid in dest.getmessageuidlist():
             if uid < 0:
                 continue
-            if not uid in self_messagelist:
+            if not self.uidexists(uid):
                 deletelist.append(uid)
         if len(deletelist):
             UIBase.getglobalui().deletingmessages(deletelist, applyto)
@@ -348,7 +364,7 @@ class BaseFolder:
         addflaglist = {}
         delflaglist = {}
         
-        for uid in self.getmessagelist().keys():
+        for uid in self.getmessageuidlist():
             if uid < 0:                 # Ignore messages missed by pass 1
                 continue
             selfflags = self.getmessageflags(uid)
Index: offlineimap/offlineimap/folder/IMAP.py
===================================================================
--- offlineimap.orig/offlineimap/folder/IMAP.py 2007-07-06 16:39:12.432245066 
+1000
+++ offlineimap/offlineimap/folder/IMAP.py      2007-07-06 16:39:15.524421279 
+1000
@@ -45,7 +45,7 @@ class IMAPFolder(BaseFolder):
         return self.accountname
 
     def suggeststhreads(self):
-        return 1
+        return 0
 
     def waitforthread(self):
         self.imapserver.connectionwait()
Index: offlineimap/offlineimap/folder/LocalStatus.py
===================================================================
--- offlineimap.orig/offlineimap/folder/LocalStatus.py  2007-07-06 
16:38:22.141379152 +1000
+++ offlineimap/offlineimap/folder/LocalStatus.py       2007-07-06 
16:39:15.692430853 +1000
@@ -19,21 +19,68 @@
 from Base import BaseFolder
 import os, threading
 
+from pysqlite2 import dbapi2 as sqlite
+
 magicline = "OFFLINEIMAP LocalStatus CACHE DATA - DO NOT MODIFY - FORMAT 1"
+newmagicline = "OFFLINEIMAP LocalStatus NOW IN SQLITE, DO NOT MODIFY"
 
 class LocalStatusFolder(BaseFolder):
+    def __deinit__(self):
+        self.save()
+        self.cursor.close()
+        self.connection.close()
+
     def __init__(self, root, name, repository, accountname):
         self.name = name
         self.root = root
         self.sep = '.'
         self.filename = os.path.join(root, name)
         self.filename = repository.getfolderfilename(name)
-        self.messagelist = None
+        self.messagelist = {}
         self.repository = repository
         self.savelock = threading.Lock()
         self.doautosave = 1
         self.accountname = accountname
         BaseFolder.__init__(self)
+       self.dbfilename = self.filename + '.sqlite'
+
+       # MIGRATE
+       if os.path.exists(self.filename):
+               self.connection = sqlite.connect(self.dbfilename)
+               self.cursor = self.connection.cursor()
+               self.cursor.execute('CREATE TABLE status (id INTEGER PRIMARY 
KEY, flags VARCHAR(50))')
+               if self.isnewfolder():
+                   self.messagelist = {}
+                   return
+               file = open(self.filename, "rt")
+               self.messagelist = {}
+               line = file.readline().strip()
+               assert(line == magicline)
+               for line in file.xreadlines():
+                   line = line.strip()
+                   uid, flags = line.split(':')
+                   uid = long(uid)
+                   flags = [x for x in flags]
+                   flags.sort()
+                   flags = ''.join(flags)
+                   self.cursor.execute('INSERT INTO status (id,flags) VALUES 
(?,?)',
+                               (uid,flags))
+               file.close()
+               self.connection.commit()
+               os.rename(self.filename, self.filename + ".old")
+               self.cursor.close()
+               self.connection.close()
+
+       # create new
+       if not os.path.exists(self.dbfilename):
+               self.connection = sqlite.connect(self.dbfilename)
+               self.cursor = self.connection.cursor()
+               self.cursor.execute('CREATE TABLE status (id INTEGER PRIMARY 
KEY, flags VARCHAR(50))')
+       else:
+               self.connection = sqlite.connect(self.dbfilename)
+               self.cursor = self.connection.cursor()
+
+
 
     def getaccountname(self):
         return self.accountname
@@ -42,7 +89,7 @@ class LocalStatusFolder(BaseFolder):
         return 0
 
     def isnewfolder(self):
-        return not os.path.exists(self.filename)
+        return not os.path.exists(self.dbfilename)
 
     def getname(self):
         return self.name
@@ -58,81 +105,94 @@ class LocalStatusFolder(BaseFolder):
 
     def deletemessagelist(self):
         if not self.isnewfolder():
-            os.unlink(self.filename)
+            self.cursor.close()
+            self.connection.close()
+            os.unlink(self.dbfilename)
 
     def cachemessagelist(self):
-        if self.isnewfolder():
-            self.messagelist = {}
-            return
-        file = open(self.filename, "rt")
-        self.messagelist = {}
-        line = file.readline().strip()
-        assert(line == magicline)
-        for line in file.xreadlines():
-            line = line.strip()
-            uid, flags = line.split(':')
-            uid = long(uid)
-            flags = [x for x in flags]
-            self.messagelist[uid] = {'uid': uid, 'flags': flags}
-        file.close()
+        return
 
     def autosave(self):
         if self.doautosave:
             self.save()
 
     def save(self):
-        self.savelock.acquire()
-        try:
-            file = open(self.filename + ".tmp", "wt")
-            file.write(magicline + "\n")
-            for msg in self.messagelist.values():
-                flags = msg['flags']
-                flags.sort()
-                flags = ''.join(flags)
-                file.write("%s:%s\n" % (msg['uid'], flags))
-            file.flush()
-            os.fsync(file.fileno())
-            file.close()
-            os.rename(self.filename + ".tmp", self.filename)
-
-            try:
-                fd = os.open(os.path.dirname(self.filename), os.O_RDONLY)
-                os.fsync(fd)
-                os.close(fd)
-            except:
-                pass
-
-        finally:
-            self.savelock.release()
+        self.connection.commit()
 
     def getmessagelist(self):
+        if self.isnewfolder():
+            self.messagelist = {}
+            return
+
+        self.messagelist = {}
+        self.cursor.execute('SELECT id,flags from status')
+        for row in self.cursor:
+            flags = [x for x in row[1]]
+            self.messagelist[row[0]] = {'uid': row[0], 'flags': flags}
+
         return self.messagelist
 
+    def uidexists(self,uid):
+       self.cursor.execute('SELECT id FROM status WHERE id=:id',{'id': uid})
+       for row in self.cursor:
+            if(row[0]==uid):
+                return 1
+        return 0
+
+    def getmessageuidlist(self):
+       self.cursor.execute('SELECT id from status')
+       r = []
+       for row in self.cursor:
+            r.append(row[0])
+        return r
+
+    def getmessagecount(self):
+       self.cursor.execute('SELECT count(id) from status');
+       row = self.cursor.fetchone()
+        return row[0]
+
     def savemessage(self, uid, content, flags, rtime):
         if uid < 0:
             # We cannot assign a uid.
             return uid
 
-        if uid in self.messagelist:     # already have it
+        if self.uidexists(uid):     # already have it
             self.savemessageflags(uid, flags)
             return uid
 
         self.messagelist[uid] = {'uid': uid, 'flags': flags, 'time': rtime}
+        flags.sort()
+        flags = ''.join(flags)
+        self.cursor.execute('INSERT INTO status (id,flags) VALUES (?,?)',
+                            (uid,flags))
         self.autosave()
         return uid
 
     def getmessageflags(self, uid):
-        return self.messagelist[uid]['flags']
+        self.cursor.execute('SELECT flags FROM status WHERE id=:id',
+                            {'id': uid})
+       for row in self.cursor:
+            flags = [x for x in row[0]]
+            return flags
+       return flags
 
     def getmessagetime(self, uid):
         return self.messagelist[uid]['time']
 
     def savemessageflags(self, uid, flags):
-        self.messagelist[uid]['flags'] = flags
+        self.messagelist[uid] = {'uid': uid, 'flags': flags}
+        flags.sort()
+        flags = ''.join(flags)
+        self.cursor.execute('UPDATE status SET flags=? WHERE id=?',(flags,uid))
         self.autosave()
 
     def deletemessage(self, uid):
-        if not uid in self.messagelist:
+        if not self.uidexists(uid):
             return
-        del(self.messagelist[uid])
-        self.autosave()
+
+       if uid in self.messagelist:
+            del(self.messagelist[uid])
+
+        self.cursor.execute('DELETE FROM status WHERE id=:id', {'id': uid})
+        return
+





Commit ever 50 updateflags requests

Improves syncing large status changes by about a factor of 50

50 is arbitrary. Seems like an acceptable number to have to redo
if crash

Index: offlineimap/offlineimap/folder/LocalStatus.py
===================================================================
--- offlineimap.orig/offlineimap/folder/LocalStatus.py  2007-07-06 
16:20:24.003939601 +1000
+++ offlineimap/offlineimap/folder/LocalStatus.py       2007-07-06 
16:21:48.856775089 +1000
@@ -41,6 +41,7 @@ class LocalStatusFolder(BaseFolder):
         self.savelock = threading.Lock()
         self.doautosave = 1
         self.accountname = accountname
+        self.count = 0 # used for batching operations before commit
         BaseFolder.__init__(self)
        self.dbfilename = self.filename + '.sqlite'
 
@@ -184,7 +185,10 @@ class LocalStatusFolder(BaseFolder):
         flags.sort()
         flags = ''.join(flags)
         self.cursor.execute('UPDATE status SET flags=? WHERE id=?',(flags,uid))
-        self.autosave()
+        self.count = self.count + 1
+        if self.count == 50:
+            self.autosave()
+            self.count = 0
 
     def deletemessage(self, uid):
         if not self.uidexists(uid):

-- 
Stewart Smith (stewart@xxxxxxxxxxxxxxxx)
http://www.flamingspork.com/


-- Attached file included as plaintext by Ecartis --
-- File: signature.asc
-- Desc: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQIVAwUARo3uDr3yNwHyU3DLAQLQChAAqDuxUYKyYbi54yLmvOfH8BsCqMd5xwQG
uZwVVXyc6odhjT7t1GpBDDTOGmgYBmhtYlWIUq9lfjKq6Jm8QxUDxTn/bYKAK2PG
bZay9lue/hgpLk3eyVN61A2D+fu3sL9ZO5TOSnlT9rgjntGGvs4rXv8xKf6cWzwY
x8Ea/9Zod41wYZzd5yHudePiDytOrX3Jko0RQZeTOsiFtRUZreWklA63JyRpbnAc
87uHJmdMvitDggC2SsNCXChgxQXTfOptaX7bMRoA7xv+g5JNFaHHOXJSLnnvvOFe
SrG6ynEuwvGTqkNiKW3o3LkcAl6QvCBHCcd8XZI7AdNmxmdYf59OMRg0WuYDWPAP
fJhFWzydristckHjxk6Dm1EQNXoFP8H/ohXuC6/e4qBaWQuSydgIoxcWWWKp/sDc
qmVxMjbHM41aJ5JeAZ/EwvbM78tb4hKPg2K9PcINbiQQ4s/KdZfF1LwEqmFgaDVt
YMli2I05xtSMVDY0azhlrQC2v7HX/l8wpmL5YUGa6UkzjAAwfkkXBcwOmqFI4lhl
CYQN2w8q+nuLrGZe192k3mY2ke20E2eAnvo1O5m1mzEWzVR2ELwsc4wb2j2HDPu6
a8K02MbmZt2qk7oSgs8qHUzS0QkhVA01cFQZlSxSDe+TgeCzkPaoDt8eIwI8serG
tyWgsQgJTF0=
=7oav
-----END PGP SIGNATURE-----




[Prev in Thread] Current Thread [Next in Thread]