Re: mailing list archiver chewing patches - Mailing list pgsql-hackers
From | Matteo Beccati |
---|---|
Subject | Re: mailing list archiver chewing patches |
Date | |
Msg-id | 4B4CDD9E.7010204@beccati.com Whole thread Raw |
In response to | Re: mailing list archiver chewing patches (Magnus Hagander <magnus@hagander.net>) |
List | pgsql-hackers |
Il 12/01/2010 21:04, Magnus Hagander ha scritto: > On Tue, Jan 12, 2010 at 20:56, Matteo Beccati<php@beccati.com> wrote: >> Il 12/01/2010 10:30, Magnus Hagander ha scritto: >>> >>> The problem is usually with strange looking emails with 15 different >>> MIME types. If we can figure out the proper way to render that, the >>> rest really is just a SMOP. >> >> Yeah, I was expecting some, but all the message I've looked at seemed to be >> working ok. > > Have you been looking at old or new messages? Try grabbing a couple of > MBOX files off archives.postgresql.org from several years back, you're > more likely to find weird MUAs then I think. Both. pgsql-hacker and -general are subscribed and getting new emails and pgsql-www is just an import of the archives: http://archives.beccati.org/pgsql-www/by/date (sorry, no paging) (just fixed a 500 error that was caused by the fact that I've been playing with the db a bit and a required helper table was missing) >>> (BTW, for something to actually be used In Production (TM), we want >>> something that uses one of our existing frameworks. So don't go >>> overboard in code-wise implementations on something else - proof of >>> concept on something else is always ok, of course) >> >> OK, that's something I didn't know, even though I expected some kind of >> limitations. Could you please elaborate a bit more (i.e. where to find >> info)? > > Well, the framework we're moving towards is built on top of django, so > that would be a good first start. > > There is also whever the commitfest thing is built on, but I'm told > that's basically no framework. I'm afraid that's outside on my expertise. But I can get as far as having a proof of concept and the required queries / php code. >> Having played with it, here's my feedback about AOX: >> >> pros: >> - seemed to be working reliably; >> - does most of the dirty job of parsing emails, splitting parts, etc >> - highly normalized schema >> - thread support (partial?) > > A killer will be if that thread support is enough. If we have to build > that completely ourselves, it'll take a lot more work. Looks like we need to populate a helper table with hierarchy information, unless Ahijit has a better idea and knows how to get it from the aox main schema. >> cons: >> - directly publishing the live email feed might not be desirable > > Why not? The scenario I was thinking at was the creation of a static snapshot and potential inconsistencies that might occur if the threads get updated during that time. >> - queries might end up being a bit complicate for simple tasks > > As long as we don't have to hit them too often, which is solve:able > with caching. And we do have a pretty good RDBMS to run the queries on > :) True :) >>> I don't think you can trust the NNTP gateway now or in the past, >>> messages are sometimes lost there. The mbox files are as complete as >>> anything we'll ever get. >> >> Importing the whole pgsql-www archive with a perl script that bounces >> messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I >> haven't looked into it that much. > > Um, yes. There is an MBOX import tool. Cool. >> With all that said, I can't promise anything as it all depends on how much >> spare time I have, but I can proceed with the evaluation if you think it's >> useful. I have a feeling that AOX is not truly the right tool for the job, >> but we might be able to customise it to suit our needs. Are there any other >> requirements that weren't specified? > > Well, I think we want to avoid customizing it. Using a custom > frontend, sure. But we don't want to end up customizing the > parser/backend. That's the road to unmaintainability. Sure. I guess my wording wasn't right... I was more thinking about adding new tables, materialized views or whatever else might be missing to make it fit out purpose. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
pgsql-hackers by date: