Re: mailing list archiver chewing patches - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: mailing list archiver chewing patches
Date
Msg-id 9837222c1001121054j40bc9302obc1123f5f6c02503@mail.gmail.com
Whole thread Raw
In response to Re: mailing list archiver chewing patches  (Dave Page <dpage@pgadmin.org>)
Responses Re: mailing list archiver chewing patches  (Matteo Beccati <php@beccati.com>)
List pgsql-hackers
On Tue, Jan 12, 2010 at 18:34, Dave Page <dpage@pgadmin.org> wrote:
> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Joshua D. Drake" <jd@commandprompt.com> writes:
>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>> So just to put this into perspective and give anyone paying attention
>>>> an idea of the pain that lies ahead should they decide to work on
>>>> this:
>>>>
>>>> - We need to import the old archives (of which there are hundreds of
>>>> thousands of messages, the first few years of which have, umm, minimal
>>>> headers.
>>>> - We need to generate thread indexes
>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>
>>>> Now there's encouragement :-)
>>
>>> Or, we just leave the current infrastructure in place and use a new one
>>> for all new messages going forward. We shouldn't limit our ability to
>>> have a decent system due to decisions of the past.
>>
>> -1.  What's the point of having archives?  IMO the mailing list archives
>> are nearly as critical a piece of the project infrastructure as the CVS
>> repository.  We've already established that moving to a new SCM that
>> fails to preserve the CVS history wouldn't be acceptable.  I hardly
>> think that the bar is any lower for mailing list archives.
>>
>> Now I think we could possibly skip the requirement suggested above for
>> URL compatibility, if we just leave the old archives on-line so that
>> those URLs all still resolve.  But if we can't load all the old messages
>> into the new infrastructure, it'll basically be useless for searching
>> purposes.
>>
>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>> but it's not clear.  Anyway my point is that Dave's first two
>> requirements are real.  Only the third might not be.)
>
> The third actually isn't actually that hard to do in theory. The
> message numbers are basically the zero-based position in the mbox
> file, and the rest of the URL is obvious.

The third part is trivial. The search system already does 95% of it.
I've already implemented exactly that kind of redirect thing on top of
the search code once just as a poc, and it was less than 30 minutes of
hacking. Can't seem to find the script ATM though, but you get the
idea.

Let's not focus on that part, we can easily solve that.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: Streaming replication status
Next
From: Marko Tiikkaja
Date:
Subject: Re: Writeable CTEs