Home > mailing lists

Re: mailing list archiver chewing patches - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: mailing list archiver chewing patches
Date	January 12, 2010 17:54:41
Msg-id	9837222c1001121054j40bc9302obc1123f5f6c02503@mail.gmail.com Whole thread Raw
In response to	Re: mailing list archiver chewing patches (Dave Page <dpage@pgadmin.org>)
Responses	Re: mailing list archiver chewing patches (Matteo Beccati <php@beccati.com>)
List	pgsql-hackers

Tree view

On Tue, Jan 12, 2010 at 18:34, Dave Page <dpage@pgadmin.org> wrote:
> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Joshua D. Drake" <jd@commandprompt.com> writes:
>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>> So just to put this into perspective and give anyone paying attention
>>>> an idea of the pain that lies ahead should they decide to work on
>>>> this:
>>>>
>>>> - We need to import the old archives (of which there are hundreds of
>>>> thousands of messages, the first few years of which have, umm, minimal
>>>> headers.
>>>> - We need to generate thread indexes
>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>
>>>> Now there's encouragement :-)
>>
>>> Or, we just leave the current infrastructure in place and use a new one
>>> for all new messages going forward. We shouldn't limit our ability to
>>> have a decent system due to decisions of the past.
>>
>> -1.  What's the point of having archives?  IMO the mailing list archives
>> are nearly as critical a piece of the project infrastructure as the CVS
>> repository.  We've already established that moving to a new SCM that
>> fails to preserve the CVS history wouldn't be acceptable.  I hardly
>> think that the bar is any lower for mailing list archives.
>>
>> Now I think we could possibly skip the requirement suggested above for
>> URL compatibility, if we just leave the old archives on-line so that
>> those URLs all still resolve.  But if we can't load all the old messages
>> into the new infrastructure, it'll basically be useless for searching
>> purposes.
>>
>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>> but it's not clear.  Anyway my point is that Dave's first two
>> requirements are real.  Only the third might not be.)
>
> The third actually isn't actually that hard to do in theory. The
> message numbers are basically the zero-based position in the mbox
> file, and the rest of the URL is obvious.

The third part is trivial. The search system already does 95% of it.
I've already implemented exactly that kind of redirect thing on top of
the search code once just as a poc, and it was less than 30 minutes of
hacking. Can't seem to find the script ATM though, but you get the
idea.

Let's not focus on that part, we can easily solve that.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/

pgsql-hackers by date:

From: Stefan Kaltenbrunner
Date: 12 January 2010, 17:49:20
Subject: Re: Streaming replication status

From: Marko Tiikkaja
Date: 12 January 2010, 17:58:11
Subject: Re: Writeable CTEs

Re: mailing list archiver chewing patches - Mailing list pgsql-hackers

Previous

Next