On 01/02/2010 17:28, Tom Lane wrote:
> Matteo Beccati<php@beccati.com> writes:
>> My main concern is that we'd need to overcomplicate the thread detection
>> algorithm so that it better deals with delayed messages: as it currently
>> works, the replies to a missing message get linked to the
>> "grand-parent". Injecting the missing message afterwards will put it at
>> the same level as its replies. If it happens only once in a while I
>> guess we can live with it, but definitely not if it happens tens of
>> times a day.
>
> That's quite common unfortunately --- I think you're going to need to
> deal with the case. Even getting a direct feed from the mail relays
> wouldn't avoid it completely: consider cases like
>
> * A sends a message
> * B replies, cc'ing A and the list
> * B's reply to list is delayed by greylisting
> * A replies to B's reply (cc'ing list)
> * A's reply goes through immediately
> * B's reply shows up a bit later
>
> That happens pretty frequently IME.
I've improved the threading algorithm by keeping an ordered backlog of
unresolved references, i.e. when a message arrives:
1. Search for a parent message using:
1a. In-Reply-To header. If referenced message is not found insert its
Message-Id to the backlog table with position 0
1b. References header. For each missing referenced message insert its
Message-Id to the backlog table with position N
1c. MS Exchange Thread-Index and Thread-Topic headers
2. Message is stored along with its parent ID, if any.
3. Compare the Message-Id header with the backlog table. Update the
parent field of any referencing message and clean up positions >= n in
the references table.
Now I just need some time to do a final clean up and I'd be ready to
publish the code, which hopefully will be clearer than my words ;)
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/