Re: mailing list archiver chewing patches - Mailing list pgsql-hackers

From Matteo Beccati
Subject Re: mailing list archiver chewing patches
Date
Msg-id 4B769C70.8060106@beccati.com
Whole thread Raw
In response to Re: mailing list archiver chewing patches  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 01/02/2010 17:28, Tom Lane wrote:
> Matteo Beccati<php@beccati.com>  writes:
>> My main concern is that we'd need to overcomplicate the thread detection
>> algorithm so that it better deals with delayed messages: as it currently
>> works, the replies to a missing message get linked to the
>> "grand-parent". Injecting the missing message afterwards will put it at
>> the same level as its replies. If it happens only once in a while I
>> guess we can live with it, but definitely not if it happens tens of
>> times a day.
>
> That's quite common unfortunately --- I think you're going to need to
> deal with the case.  Even getting a direct feed from the mail relays
> wouldn't avoid it completely: consider cases like
>
>     * A sends a message
>     * B replies, cc'ing A and the list
>     * B's reply to list is delayed by greylisting
>     * A replies to B's reply (cc'ing list)
>     * A's reply goes through immediately
>     * B's reply shows up a bit later
>
> That happens pretty frequently IME.

I've improved the threading algorithm by keeping an ordered backlog of 
unresolved references, i.e. when a message arrives:

1. Search for a parent message using:

1a. In-Reply-To header. If referenced message is not found insert its 
Message-Id to the backlog table with position 0

1b. References header. For each missing referenced message insert its 
Message-Id to the backlog table with position N

1c. MS Exchange Thread-Index and Thread-Topic headers

2. Message is stored along with its parent ID, if any.

3. Compare the Message-Id header with the backlog table. Update the 
parent field of any referencing message and clean up positions >= n in 
the references table.

Now I just need some time to do a final clean up and I'd be ready to 
publish the code, which hopefully will be clearer than my words ;)


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


pgsql-hackers by date:

Previous
From: Tim Bunce
Date:
Subject: Re: Package namespace and Safe init cleanup for plperl [PATCH]
Next
From: Andrew Dunstan
Date:
Subject: Re: Package namespace and Safe init cleanup for plperl [PATCH]