Re: mailing list archiver chewing patches - Mailing list pgsql-hackers

From Aidan Van Dyk
Subject Re: mailing list archiver chewing patches
Date
Msg-id 20100112201647.GC18076@oak.highrise.ca
Whole thread Raw
In response to Re: mailing list archiver chewing patches  (Matteo Beccati <php@beccati.com>)
Responses Re: mailing list archiver chewing patches  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-hackers
I'll note that the whole idea of a "email archive" interface might be a
very good "advocacy" project as well.  AOX might not be a perfect fit,
but it could be a good learning experience... Really, all the PG mail
archives need is:

1) A nice normalized DB schema representing mail messages and their  relations to other message and "recipients" (or
"folders")

2) A "injector" that can parse an email message, and de-compose it into  the various parts/tables of the DB schema, and
insertit
 

3) A nice set of SQL queries to return message, parts, threads,  folders based on $criteria (search, id, folder, etc)

4) A web interface to view the messages/thread/parts #3 returns

The largest part of this is #1, but a good schema would be a very good
candidate to show of some of PG's more powerful features in a way that
"others" could see (like the movie store sample somewhere) , such as: 1) full text search 2) text vs bytea handling
(thinkingof all the mime parts, and encoding,    etc) 3) CTEs, ltree, recursion, etc, for threading/searching 4)
Triggersfor "materialized views" (for quick threading/folder queries) 5) expression indexes
 

a.

* Matteo Beccati <php@beccati.com> [100112 14:56]:

> Having played with it, here's my feedback about AOX:
>
> pros:
> - seemed to be working reliably;
> - does most of the dirty job of parsing emails, splitting parts, etc
> - highly normalized schema
> - thread support (partial?)
>
> cons:
> - directly publishing the live email feed might not be desirable
> - queries might end up being a bit complicate for simple tasks
> - might be not easy to add additional processing in the workflow

> If there isn't a fully usable thread hierarchy I was more thinking to  
> ltree, mainly because I've successfully used it in past and I haven't  
> had enough time yet to look at CTEs. But if performance is comparable I  
> don't see a reason why we shouldn't use them.

> With all that said, I can't promise anything as it all depends on how  
> much spare time I have, but I can proceed with the evaluation if you  
> think it's useful. I have a feeling that AOX is not truly the right tool  
> for the job, but we might be able to customise it to suit our needs. Are  
> there any other requirements that weren't specified?

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Streaming replication status
Next
From: Matteo Beccati
Date:
Subject: Re: mailing list archiver chewing patches